Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Code & Software
Safety & Security
Evaluation
Research
Development
Meta
Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Code & Software
Safety & Security
Evaluation
Research
Development
Meta
DSPy (Declarative Self-improving Python) is a framework developed by Stanford NLP for programming — not prompting — language models. Rather than manually crafting prompts, DSPy lets developers define the behavior of LM-powered programs through structured signatures, composable modules, and algorithmic optimizers that automatically tune prompts, few-shot examples, and even model weights.
As of 2025, DSPy has reached version 3.0 and represents a paradigm shift in how developers build LM applications — treating language models as programmable modules analogous to layers in a neural network.
Signatures define the input/output contract for a language model call, similar to type annotations. Instead of writing a prompt, you declare what goes in and what comes out. DSPy automatically expands signatures into optimized prompts.
Modules are composable building blocks that implement specific LM invocation strategies. Built-in modules include ChainOfThought, ReAct, ProgramOfThought, and MultiChainComparison. Developers compose modules into programs like building blocks.
Optimizers (formerly called teleprompters or compilers) algorithmically tune the entire program. Given a metric and a small set of training examples (as few as 10-20), optimizers like BootstrapFewShot, MIPRO, and MIPROv2 automatically select the best instructions, few-shot demonstrations, and configurations. This eliminates manual prompt engineering.
The fundamental insight of DSPy is that prompt engineering is brittle and non-transferable. When you change your LLM, pipeline, or data, hand-tuned prompts break. DSPy addresses this by:
This approach has demonstrated significant improvements: MIPROv2 targets 20%+ performance gains on representative tasks with limited labeled data.
import dspy # Configure the language model lm = dspy.LM('openai/gpt-4o-mini') dspy.configure(lm=lm) # Define a signature for question answering class GenerateAnswer(dspy.Signature): "Answer questions with short factoid answers." question: str = dspy.InputField() answer: str = dspy.OutputField(desc='often between 1 and 5 words') # Create a module using ChainOfThought qa = dspy.ChainOfThought(GenerateAnswer) # Use it directly pred = qa(question='What is the capital of France?') print(pred.answer) # Paris # Optimize with training data and a metric from dspy.teleprompt import BootstrapFewShot def accuracy_metric(example, pred, trace=None): return example.answer.lower() == pred.answer.lower() optimizer = BootstrapFewShot(metric=accuracy_metric, max_bootstrapped_demos=4) compiled_qa = optimizer.compile(qa, trainset=trainset)
| Aspect | DSPy | LangChain |
|---|---|---|
| Paradigm | Declarative programming + automatic optimization | Chain-based orchestration with manual prompts |
| Prompt Engineering | Eliminated via optimizers | Manual, user-managed |
| Portability | Programs transfer across LMs | Prompts are model-specific |
| Abstraction | Signatures and modules | Chains and agents |
| Optimization | Built-in algorithmic compilers | No native self-improving capability |
| Philosophy | Treat LMs like neural network layers | Treat LMs like APIs to chain together |