====== Instructor ======
**Instructor** is a lightweight Python library for extracting structured, validated data from large language models by leveraging Pydantic models and [[function_calling|function calling]]. Rather than parsing free-form text, Instructor patches LLM client libraries to return data conforming to user-defined schemas, with automatic retry logic and validation. Created by **Jason Liu** (jxnl), it supports 15+ LLM providers.(([[https://[[github|github]])).com/jxnl/instructor|github.com/jxnl/instructor]]))

  * **Website:** [[https://python.useinstructor.com|python.useinstructor.com]](([[https://python.useinstructor.com|python.useinstructor.com]]))
  * **[[github|GitHub]]:** [[https://github.com/jxnl/instructor|github.com/jxnl/instructor]]
  * **Install:** ''pip install instructor''
  * **License:** MIT
  * **Ports:** Python (primary), Go, TypeScript, Ruby, Elixir

===== How It Works =====
Instructor patches provider SDKs (like [[openai|OpenAI]]'s client) to add a ''response_model'' parameter that accepts any Pydantic model. The library handles schema generation, prompt construction, response parsing, and validation automatically.

<code python>
import instructor
from pydantic import BaseModel
from [[openai|openai]] import [[openai|OpenAI]]

class Person(BaseModel):
    name: str
    age: int
    occupation: str

client = instructor.from_openai([[openai|OpenAI]]())
person = client.chat.completions.create(
    model="gpt-4o",
    response_model=Person,
    messages=[{"role": "user", "content": "John is a 30-year-old software engineer."}]
)
# Returns: Person(name='John', age=30, occupation='software engineer')
</code>

===== Pydantic Integration =====
Instructor builds directly on **Pydantic** for schema definitions, providing:

  * **Type safety:** Standard Python type hints with IDE autocompletion
  * **Nested structures:** Complex models with lists, optionals, and nested objects
  * **Custom validators:** Pydantic validators for business logic constraints
  * **Semantic validation:** LLM-powered validation for subjective criteria (e.g., "is this summary accurate?")
  * **Zero new syntax:** Uses standard Pydantic models, no framework-specific DSL

The following example extracts structured data from unstructured text with nested models and validation:

<code python>
# Extract structured contact info with nested models and auto-retry on validation failure
import instructor
from pydantic import BaseModel, field_validator
from [[openai|openai]] import [[openai|OpenAI]]

class Address(BaseModel):
    street: str
    city: str
    state: str

class Contact(BaseModel):
    name: str
    email: str
    address: Address

    @field_validator("email")
    @classmethod
    def validate_email(cls, v):
        if "@" not in v:
            raise ValueError("Invalid email format")
        return v

client = instructor.from_openai([[openai|OpenAI]]())
contact = client.chat.completions.create(
    model="gpt-4o",
    response_model=Contact,
    max_retries=3,  # auto-retries on validation failure
    messages=[{"role": "user", "content": "Jane Doe, jane@acme.com, lives at 123 Main St, Austin, TX"}],
)
print(contact.model_dump_json(indent=2))
</code>

===== Supported Providers =====
Instructor uses a unified interface via ''instructor.from_provider()'' or provider-specific patchers:

  * **[[openai|OpenAI]]** (GPT-4o, GPT-5, o3) - core integration
  * **[[anthropic|Anthropic]]** ([[claude|Claude]] 3.5, Claude 4)
  * **[[google|Google]]** (Gemini)
  * **[[cohere|Cohere]]**
  * **[[ollama|Ollama]]** (local models like Llama 3)
  * **[[deepseek|DeepSeek]], Together, Groq**
  * **llama-cpp-python** (local inference)
  * **Writer**
  * Any [[openai|OpenAI]]-compatible API

===== Key Features =====
  * **Retry Logic:** Automatic retries on validation failure using Tenacity integration, with configurable max attempts
  * **Streaming:** Support for partial responses and real-time list building
  * **Low Abstraction:** Zero-overhead patch that can be enabled/disabled without refactoring
  * **Multimodal:** Support for vision inputs alongside text
  * **llms.txt:** Implements the llms.txt specification for documentation discoverability
  * **Iterable responses:** Stream lists of objects as they are generated

===== Use Cases =====
  * Extracting structured data from unstructured text (invoices, emails, documents)
  * Building reliable data pipelines from LLM outputs
  * Classification and categorization with validated outputs
  * Content generation with schema-enforced structure
  * RAG systems requiring structured query decomposition

===== See Also =====
  * [[guidance|Guidance]]
  * [[quickcompare|QuickCompare]]

===== References =====
===== Related Pages =====
  * [[function_calling|OpenAI Function Calling]]
  * [[lite_llm|LiteLLM]]
  * [[tool_integration_patterns|Tool Integration Patterns]]
  * [[tool_augmented_language_models|Tool-Augmented Language Models]]