Tidy pipelines and structured output

We will show both unstructured and structured pipelines, using open models: - deepseek-chat (DeepSeek) - llama-3.1-8b-instant (Groq) - openai/gpt-oss-20b (Groq)

llm_fn: unstructured (DeepSeek)

words <- c("excellent", "awful", "fine")
out <- llm_fn(
  words,
  prompt  = "Classify '{x}' as Positive, Negative, or Neutral.",
  .config = cfg_ds,
  .return = "columns"
)
out

llm_fn: unstructured (Groq)

out_groq <- llm_fn(
  words,
  prompt  = "Classify '{x}' as Positive, Negative, or Neutral.",
  .config = cfg_groq1,
  .return = "columns"
)
out_groq

llm_fn_structured: schema-first (DeepSeek)

schema <- list(
  type = "object",
  properties = list(
    label = list(type = "string", description = "Sentiment label"),
    score = list(type = "number", description = "Confidence 0..1")
  ),
  required = list("label", "score"),
  additionalProperties = FALSE
)

out_s <- llm_fn_structured(
  x = words,
  prompt  = "Classify '{x}' as Positive, Negative, or Neutral with confidence.",
  .config = cfg_ds,
  .schema = schema,
  .fields = c("label", "score")
)
out_s

llm_mutate: unstructured (Groq)

df <- tibble::tibble(
  id   = 1:3,
  text = c("Cats are great pets", "The weather is bad", "I like tea")
)

df_u <- df |>
  llm_mutate(
    answer  = "Give a short category for: {text}",
    .config = cfg_groq,
    .return = "columns"
  )

df_u

llm_mutate: shorthand syntax

The shorthand lets you combine output column and prompt in one argument:

df |>
  llm_mutate(
    category = "Give a short category for: {text}",
    .config = cfg_groq
  )
# Equivalent to: llm_mutate(category, prompt = "Give...", .config = cfg_groq)

Or with multi-turn messages:

df |>
  llm_mutate(
    classified = c(
      system = "You are a text classifier. One word only.",
      user = "Category for: {text}"
    ),
    .config = cfg_ds
  )

llm_mutate with .structured flag

Enable structured output directly in llm_mutate() using .structured = TRUE:

schema <- list(
  type = "object",
  properties = list(
    category = list(type = "string"),
    confidence = list(type = "number")
  ),
  required = list("category", "confidence")
)

# Using .structured = TRUE (equivalent to calling llm_mutate_structured)
df |>
  llm_mutate(
    structured_result = "{text}",
    .config = cfg_ds,
    .structured = TRUE,
    .schema = schema
  )

This is equivalent to calling llm_mutate_structured() and supports all the same shorthand syntax.

Soft structured output with tags

When a strict JSON schema is unnecessary, request simple XML-like tags and let LLMR parse them into columns. In the ordinary one-row-per-call mode below, tags should be flat (not nested); the row-batching mode further down deliberately introduces one level of nesting and is documented there.

cities <- tibble::tibble(city = c("Cairo", "Lima", "Seoul"))

cities |>
  llm_mutate(
    geo = "Where is {city}? Give country and continent in their own tags.",
    .config = cfg_groq1,
    .system_prompt = paste(
      "Use XML tags to specify different parts of the answer, but do not nest tags.",
      "Return <country>...</country> and <continent>...</continent>."
    ),
    .tags = c("country", "continent")
  )

The result includes tags_ok, tags_data, and one column per requested tag. Use llm_parse_tags_col() to parse an existing response column.

Row batching: many rows per call

By default LLMR sends one request per row. With .rows_per_prompt > 1, several rows are packed into a single request: each row’s prompt is wrapped in a numbered tag (<row_1>...</row_1>, <row_2>...</row_2>, …), the block is appended to the message, and the model is asked to answer each item inside a matching numbered tag. LLMR splits the reply back into the original rows. .rows_per_prompt = Inf sends the whole frame in one call.

cities |>
  llm_mutate(
    geo = "Where is {city}? Give country and continent in their own tags.",
    .config = cfg_groq1,
    .tags = c("country", "continent"),
    .rows_per_prompt = 3
  )

A few points worth keeping in mind:

Two notions of “batch”. This generative row batching is unrelated to get_batched_embeddings(), which splits many texts across several embedding calls. The .rows_per_prompt argument applies only to generative calls.
One level of nesting in tag mode. Inside each <row_i> block the model emits the requested field tags, so batched tag output is intentionally nested one level. This is the opposite of the flat-tag guidance for single-row calls; LLMR adjusts the instruction automatically.
Structured output. .structured = TRUE together with .rows_per_prompt > 1 asks for a single JSON object {"results":[{"row":i, ...}]} and maps each element back by its integer row. It emits a one-time warning, because it relies on the model following the protocol and replaces strict provider-side schema validation with local parsing.
Fault tolerance. Rows that the model drops, reorders, duplicates, or truncates are detected and re-issued according to .rowpack_recovery (by default the unresolved rows are retried at half the batch size, recursively, down to single rows). Unrecoverable rows are returned as NA with a diagnostic finish reason.
Cost. Batching reduces the number of requests and the repeated system-prompt overhead, but it only pays off when the model reliably follows the wrapping protocol. Prefer capable models at temperature = 0, and modest batch sizes.
Diagnostics. When batching actually groups rows, llm_mutate() adds <col>_batch, <col>_bn, and <col>_bi columns identifying the batch, its size, and the row’s position within it. Token counts and wall-clock duration are attributed once per batch (on its first resolved row) so that summing those columns is correct. One caveat: when a batch reply is entirely unusable and its rows succeed only through recovery calls, the failed call’s spend has no successful row to land on, so sums can slightly undercount in heavy-recovery runs.

Preview before you spend, summarize after

llm_preview() renders exactly what llm_fn() / llm_mutate() would send, without any API call and without reading or encoding files. It flags problems up front: missing files, a "file" role combined with .rows_per_prompt > 1, an embedding config with row batching, and so on. The batch plan columns show how rows would be grouped into calls.

df <- data.frame(text = c("good", "bad", "fine"), stringsAsFactors = FALSE)
LLMR::llm_preview(df, prompt = "Sentiment of: {text}", .rows_per_prompt = 2)

After a run, llm_usage() summarizes outcomes and token totals, and llm_failures() lists the rows that failed or were truncated. Both read the diagnostic columns that llm_mutate() and call_llm_par() already produce. llm_usage() reports tokens, not dollars: multiply by your provider’s current per-token prices yourself.

out <- df |>
  llm_mutate(sentiment = "One-word sentiment for: {text}", .config = cfg_groq)

llm_usage(out)       # counts + sent/received/total/reasoning tokens
llm_failures(out)    # which rows failed or were truncated, and why

For a call_llm_par() result you can re-run only the failures with llm_par_resume().