Structured outputs refer to techniques and tools that constrain LLM generation to produce well-formed data in a specified format (JSON, XML, SQL, code, etc.) rather than free-form text. This capability is essential for integrating LLMs into software systems where downstream components require predictable, parseable responses.
LLMs natively produce free-form text, but production applications need:
Without structured output guarantees, applications resort to brittle regex parsing, retry loops, and manual validation, all of which degrade reliability and increase latency.
The simplest approach: instruct the model to output a specific format via the prompt.
Model providers offer native function-calling interfaces where the model selects and populates structured function parameters:
Function calling has become the de facto standard for structured agent interactions, serving as the backbone of tool utilization in modern agent frameworks.
Intervenes during token generation to mask invalid tokens, guaranteeing schema compliance:
How it works: At each token generation step, a finite-state automaton or pushdown automaton derived from the target schema masks logits for tokens that would violate the schema. This guarantees structural validity without post-processing.
The following example uses OpenAI's native structured output with response_format to guarantee a valid JSON response matching a Pydantic schema:
# [[openai|OpenAI]] Structured Outputs with response_format and Pydantic from [[openai|openai]] import [[openai|OpenAI]] from pydantic import BaseModel class MovieReview(BaseModel): title: str rating: float pros: list[str] cons: list[str] recommended: bool client = [[openai|OpenAI]]() completion = client.beta.chat.completions.parse( model="gpt-4o", messages=[ {"role": "system", "content": "Extract a structured movie review."}, {"role": "user", "content": "Dune Part Two was visually stunning with great acting. " "The pacing dragged in the middle. 8.5/10, highly recommended."}, ], response_format=MovieReview, ) review = completion.choices[0].message.parsed print(f"{review.title}: {review.rating}/10 - Recommended: {review.recommended}") print(f"Pros: {review.pros}")
Libraries that compile schemas into generation grammars:
SLOT (Structured LLM Output Transformer) (EMNLP Industry 2025): A model-agnostic approach using a lightweight fine-tuned model to transform unstructured LLM output into schema-compliant structured data.3) A fine-tuned Mistral-7B achieves 99.5% schema accuracy and 94.0% content similarity, and even compact models like Llama-3.2-1B can match larger proprietary models.
==== Instructor ==== 4)
.with_structured_output() method for any supported model