DSPy

DSPy (Declarative Self-improving Python) is a framework developed by Stanford NLP for programming — not prompting — language models. Rather than manually crafting prompts, DSPy lets developers define the behavior of LM-powered programs through structured signatures, composable modules, and algorithmic optimizers that automatically tune prompts, few-shot examples, and even model weights.

As of 2025, DSPy has reached version 3.0 and represents a paradigm shift in how developers build LM applications — treating language models as programmable modules analogous to layers in a neural network.

Core Concepts

Signatures define the input/output contract for a language model call, similar to type annotations. Instead of writing a prompt, you declare what goes in and what comes out. DSPy automatically expands signatures into optimized prompts.

Modules are composable building blocks that implement specific LM invocation strategies. Built-in modules include ChainOfThought, ReAct, ProgramOfThought, and MultiChainComparison. Developers compose modules into programs like building blocks.

Optimizers (formerly called teleprompters or compilers) algorithmically tune the entire program. Given a metric and a small set of training examples (as few as 10-20), optimizers like BootstrapFewShot, MIPRO, and MIPROv2 automatically select the best instructions, few-shot demonstrations, and configurations. This eliminates manual prompt engineering.

Programming vs. Prompting

The fundamental insight of DSPy is that prompt engineering is brittle and non-transferable. When you change your LLM, pipeline, or data, hand-tuned prompts break. DSPy addresses this by:

Treating LM calls as declarative modules with defined contracts
Using optimization algorithms to find the best prompts automatically
Making programs portable across different LMs (GPT-4, Llama, T5, etc.)
Enabling systematic iteration similar to training neural networks

This approach has demonstrated significant improvements: MIPROv2 targets 20%+ performance gains on representative tasks with limited labeled data.

Code Example

import dspy
 
# Configure the language model
lm = dspy.LM('openai/gpt-4o-mini')
dspy.configure(lm=lm)
 
# Define a signature for question answering
class GenerateAnswer(dspy.Signature):
    "Answer questions with short factoid answers."
    question: str = dspy.InputField()
    answer: str = dspy.OutputField(desc='often between 1 and 5 words')
 
# Create a module using ChainOfThought
qa = dspy.ChainOfThought(GenerateAnswer)
 
# Use it directly
pred = qa(question='What is the capital of France?')
print(pred.answer)  # Paris
 
# Optimize with training data and a metric
from dspy.teleprompt import BootstrapFewShot
 
def accuracy_metric(example, pred, trace=None):
    return example.answer.lower() == pred.answer.lower()
 
optimizer = BootstrapFewShot(metric=accuracy_metric, max_bootstrapped_demos=4)
compiled_qa = optimizer.compile(qa, trainset=trainset)

How DSPy Differs from LangChain

Aspect	DSPy	LangChain
Paradigm	Declarative programming + automatic optimization	Chain-based orchestration with manual prompts
Prompt Engineering	Eliminated via optimizers	Manual, user-managed
Portability	Programs transfer across LMs	Prompts are model-specific
Abstraction	Signatures and modules	Chains and agents
Optimization	Built-in algorithmic compilers	No native self-improving capability
Philosophy	Treat LMs like neural network layers	Treat LMs like APIs to chain together

AI Agent Knowledge Base

Sidebar

Table of Contents

DSPy

Core Concepts

Programming vs. Prompting

Code Example

How DSPy Differs from LangChain

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

DSPy

Core Concepts

Programming vs. Prompting

Code Example

How DSPy Differs from LangChain

References

See Also

Page Tools