====== DSPy ====== **DSPy** (Declarative Self-improving Python) is a framework developed by Stanford NLP for **programming — not prompting — language models**. Rather than manually crafting prompts, DSPy lets developers define the behavior of LM-powered programs through structured signatures, composable modules, and algorithmic optimizers that automatically tune prompts, few-shot examples, and even model weights. As of 2025, DSPy has reached version 3.0 and represents a paradigm shift in how developers build LM applications — treating language models as programmable modules analogous to layers in a neural network. ===== Core Concepts ===== **Signatures** define the input/output contract for a language model call, similar to type annotations. Instead of writing a prompt, you declare what goes in and what comes out. DSPy automatically expands signatures into optimized prompts. **Modules** are composable building blocks that implement specific LM invocation strategies. Built-in modules include ''ChainOfThought'', ''ReAct'', ''ProgramOfThought'', and ''MultiChainComparison''. Developers compose modules into programs like building blocks. **Optimizers** (formerly called teleprompters or compilers) algorithmically tune the entire program. Given a metric and a small set of training examples (as few as 10-20), optimizers like ''BootstrapFewShot'', ''MIPRO'', and ''MIPROv2'' automatically select the best instructions, few-shot demonstrations, and configurations. This eliminates manual prompt engineering. ===== Programming vs. Prompting ===== The fundamental insight of DSPy is that prompt engineering is brittle and non-transferable. When you change your LLM, pipeline, or data, hand-tuned prompts break. DSPy addresses this by: * Treating LM calls as declarative modules with defined contracts * Using optimization algorithms to find the best prompts automatically * Making programs portable across different LMs (GPT-4, Llama, T5, etc.) * Enabling systematic iteration similar to training neural networks This approach has demonstrated significant improvements: MIPROv2 targets 20%+ performance gains on representative tasks with limited labeled data. ===== Code Example ===== import dspy # Configure the language model lm = dspy.LM('openai/gpt-4o-mini') dspy.configure(lm=lm) # Define a signature for question answering class GenerateAnswer(dspy.Signature): "Answer questions with short factoid answers." question: str = dspy.InputField() answer: str = dspy.OutputField(desc='often between 1 and 5 words') # Create a module using ChainOfThought qa = dspy.ChainOfThought(GenerateAnswer) # Use it directly pred = qa(question='What is the capital of France?') print(pred.answer) # Paris # Optimize with training data and a metric from dspy.teleprompt import BootstrapFewShot def accuracy_metric(example, pred, trace=None): return example.answer.lower() == pred.answer.lower() optimizer = BootstrapFewShot(metric=accuracy_metric, max_bootstrapped_demos=4) compiled_qa = optimizer.compile(qa, trainset=trainset) ===== How DSPy Differs from LangChain ===== ^ Aspect ^ DSPy ^ LangChain ^ | Paradigm | Declarative programming + automatic optimization | Chain-based orchestration with manual prompts | | Prompt Engineering | Eliminated via optimizers | Manual, user-managed | | Portability | Programs transfer across LMs | Prompts are model-specific | | Abstraction | Signatures and modules | Chains and agents | | Optimization | Built-in algorithmic compilers | No native self-improving capability | | Philosophy | Treat LMs like neural network layers | Treat LMs like APIs to chain together | ===== References ===== * [[https://arxiv.org/abs/2310.03714|Khattab et al., 2023 — DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines]] * [[https://github.com/stanfordnlp/dspy|DSPy GitHub Repository]] * [[https://dspy.ai|DSPy Official Website and Documentation]] * [[https://dspy.ai/roadmap/|DSPy Roadmap]] ===== See Also ===== * [[langgraph]] — LangGraph for stateful agent workflows * [[crewai]] — CrewAI for role-based multi-agent teams * [[agent_evaluation]] — Benchmarks for evaluating AI systems * [[smolagents]] — HuggingFace lightweight agents