Structured Outputs

Structured outputs refer to techniques and tools that constrain LLM generation to produce well-formed data in a specified format (JSON, XML, SQL, code, etc.) rather than free-form text. This capability is essential for integrating LLMs into software systems where downstream components require predictable, parseable responses.

Why Structured Outputs Matter

LLMs natively produce free-form text, but production applications need:

Reliable parsing: API responses must conform to schemas for programmatic consumption
Type safety: Fields must have correct types (strings, numbers, booleans, arrays)
Completeness: All required fields must be present
Consistency: Outputs must follow the same structure across invocations
Integration: Structured data connects LLMs to databases, APIs, UI components, and tool pipelines

Without structured output guarantees, applications resort to brittle regex parsing, retry loops, and manual validation, all of which degrade reliability and increase latency.

Approaches to Structured Output

1. Prompting-Based

The simplest approach: instruct the model to output a specific format via the prompt.

Pros: Works with any model, no special tooling required
Cons: No guarantees; models may include preamble text, miss fields, or produce malformed output
Techniques: Few-shot examples, explicit format instructions, “respond only with valid JSON”

2. Function Calling / Tool Use

Model providers offer native function-calling interfaces where the model selects and populates structured function parameters:

OpenAI Function Calling (2023+): Model outputs JSON arguments matching a function schema; extended in 2024-2025 with parallel function calls and strict mode
Anthropic Tool Use: Claude models output structured tool calls with typed parameters; supports complex nested schemas
Google Gemini Function Calling: Similar structured invocation with grounding in Google Search and other tools

Function calling has become the de facto standard for structured agent interactions, serving as the backbone of tool utilization in modern agent frameworks.

3. Constrained Decoding

Intervenes during token generation to mask invalid tokens, guaranteeing schema compliance:

OpenAI Structured Outputs (2024-2025): Uses a Context-Free Grammar (CFG) engine to enforce JSON Schema compliance at generation time. GPT-4o and GPT-5 achieve 100% schema compliance in strict mode with ~50% latency reduction vs. unconstrained generation with retries.¹⁾
SGLang (2024-2025): High-performance serving framework with built-in constrained decoding for structured outputs
vLLM: Supports guided generation via Outlines integration
llama.cpp: Grammar-based sampling that constrains generation to GBNF grammars; achieves top performance on JSON Schema Store benchmarks

How it works: At each token generation step, a finite-state automaton or pushdown automaton derived from the target schema masks logits for tokens that would violate the schema. This guarantees structural validity without post-processing.

The following example uses OpenAI's native structured output with response_format to guarantee a valid JSON response matching a Pydantic schema:

# [[openai|OpenAI]] Structured Outputs with response_format and Pydantic
from [[openai|openai]] import [[openai|OpenAI]]
from pydantic import BaseModel
 
class MovieReview(BaseModel):
    title: str
    rating: float
    pros: list[str]
    cons: list[str]
    recommended: bool
 
client = [[openai|OpenAI]]()
 
completion = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract a structured movie review."},
        {"role": "user", "content": "Dune Part Two was visually stunning with great acting. "
         "The pacing dragged in the middle. 8.5/10, highly recommended."},
    ],
    response_format=MovieReview,
)
 
review = completion.choices[0].message.parsed
print(f"{review.title}: {review.rating}/10 - Recommended: {review.recommended}")
print(f"Pros: {review.pros}")

4. Grammar-Based Generation

Libraries that compile schemas into generation grammars:

Outlines (github.com/dottxt-ai/outlines|dottxt]], 2023-2025): Python library that compiles JSON Schema, regex, or CFG into efficient token masks; works with any HuggingFace model
Guidance²⁾ (github.com/guidance-ai/guidance|Microsoft]], 2023-2025): Interleaves generation with programmatic control flow; highest empirical coverage across benchmarks per JSONSchemaBench (2025)
LMQL (2023): SQL-like query language for LLMs with type constraints and scripted prompting
XGrammar (2024): High-performance grammar-based constrained decoding engine

5. Post-Processing Transformation

SLOT (Structured LLM Output Transformer) (EMNLP Industry 2025): A model-agnostic approach using a lightweight fine-tuned model to transform unstructured LLM output into schema-compliant structured data.³⁾ A fine-tuned Mistral-7B achieves 99.5% schema accuracy and 94.0% content similarity, and even compact models like Llama-3.2-1B can match larger proprietary models.

Libraries and Frameworks

==== Instructor ==== ⁴⁾

GitHub: https://github.com/jxnl/instructor
Built on Pydantic for schema definition and validation
Supports OpenAI, Anthropic, Google, Mistral, and open-source models
Automatic retries with validation error feedback

LangChain

GitHub: langchain-ai/langchain]]
.with_structured_output() method for any supported model
PydanticOutputParser and StructuredOutputParser
Integration with tool pipelines

BAML

Website: https://www.baml.dev/
Domain-specific language for defining LLM functions with typed inputs/outputs
Compiler generates type-safe client code
Built-in testing and validation

Marvin

GitHub: https://github.com/PrefectHQ/marvin
Lightweight Python library for structured extraction and classification
Uses Pydantic models as output schemas

Magentic

GitHub: https://github.com/jackmpcollins/magentic
Decorator-based interface: annotate functions with return types and get structured outputs
Works with Pydantic models for complex structured outputs

LlamaIndex

Website: llamaindex.ai/]]
Structured output support via Pydantic programs and output parsers
Deep integration with RAG pipelines

Pydantic

Website: https://docs.pydantic.dev/
The de facto standard for schema definition in Python-based structured output tools
JSON Schema generation used by most libraries above

Evaluation and Benchmarks

JSONSchemaBench (2025): Systematic benchmark evaluating constrained decoding across efficiency, coverage, and quality. Reveals that coverage drops on complex schemas (nested objects, conditional fields) even for leading frameworks.
SLOTBench (EMNLP 2025): Evaluates post-processing approaches on schema accuracy and content fidelity across diverse domains.
Key finding: Supervised fine-tuning combined with constrained decoding produces the best results; neither alone is sufficient for complex schemas.

Best Practices

Use native structured output modes when available (OpenAI strict mode, Anthropic tool use) for highest reliability
Define schemas with Pydantic for type safety, validation, and automatic JSON Schema generation
Include descriptions in schema fields to guide model generation with semantic context
Use constrained decoding for open-weight models to guarantee compliance
Implement retry with feedback: On validation failure, pass the error back to the model for correction (the approach Instructor uses)
Keep schemas simple: Deeply nested or highly conditional schemas reduce reliability across all approaches
Test with JSONSchemaBench or similar benchmarks to evaluate reliability before production deployment

References

¹⁾

OpenAI Structured Outputs Documentation. 2024.

²⁾

github.com/guidance-ai/guidance|Guidance: A Guidance Language for Controlling LLMs. Microsoft, 2023.]]

³⁾

github.com/dottxt-ai/outlines|Outlines: Structured Text Generation. dottxt, 2023.]]

⁴⁾

github.com/jxnl/instructor|Instructor: Structured LLM Outputs. jxnl, 2023.]]

AI Agent Knowledge Base

Sidebar

Table of Contents

Structured Outputs

Why Structured Outputs Matter

Approaches to Structured Output

1. Prompting-Based

2. Function Calling / Tool Use

3. Constrained Decoding

4. Grammar-Based Generation

5. Post-Processing Transformation

Libraries and Frameworks

LangChain

BAML

Marvin

Magentic

LlamaIndex

Pydantic

Evaluation and Benchmarks

Best Practices

See Also

References

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Structured Outputs

Why Structured Outputs Matter

Approaches to Structured Output

1. Prompting-Based

2. Function Calling / Tool Use

3. Constrained Decoding

4. Grammar-Based Generation

5. Post-Processing Transformation

Libraries and Frameworks

LangChain

BAML

Marvin

Magentic

LlamaIndex

Pydantic

Evaluation and Benchmarks

Best Practices

See Also

References

Page Tools