querychat automatically gathers information about your table to help the LLM write accurate SQL queries. This includes column names and types, numerical ranges, and categorical value examples. (All of this information is provided to the LLM as part of the system prompt – a string of text containing instructions and context for the LLM to consider when responding to user queries.)
Importantly, we are not sending your raw data to the LLM and asking it to do complicated math. The LLM only needs to understand the structure and schema of your data in order to write SQL queries.
You can get even better results by customizing the system prompt in three ways:
For full visibility into the system prompt that querychat generates
for the LLM, you can inspect the system_prompt field. This
is useful for debugging and understanding exactly what context the LLM
is using:
By default, the system prompt contains the following components:
data_description)extra_instructions).If your column names are descriptive, querychat may already work well
without additional context. However, if your columns are named
x, V1, value, etc., you should
provide a data description. Use the data_description
parameter for this:
querychat doesn’t need this information in any particular format – just provide what a human would find helpful:
<!-- data_description.md -->
This dataset contains information about Palmer Archipelago penguins,
collected for studying penguin populations.
- species: Penguin species (Adelie, Chinstrap, Gentoo)
- island: Island where observed (Torgersen, Biscoe, Dream)
- bill_length_mm: Bill length in millimeters
- bill_depth_mm: Bill depth in millimeters
- flipper_length_mm: Flipper length in millimeters
- body_mass_g: Body mass in grams
- sex: Penguin sex (male, female)
- year: Year of observationYou can add custom instructions to guide the LLM’s behavior using the
extra_instructions parameter:
Or as a string:
instructions <- "
- Use British spelling conventions
- Stay on topic and only discuss the data dashboard
- Refuse to answer unrelated questions
"
qc <- querychat(
penguins,
extra_instructions = instructions
)
cat(qc$system_prompt)LLMs may not always follow your instructions perfectly. Test extensively when changing instructions or models.
If you want more control over the system prompt, you can provide a
custom prompt template using the prompt_template parameter.
This is for more advanced users who want to fully customize the LLM’s
behavior. See the QueryChat
reference for details on the available template variables.