Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
Prompt chaining is an orchestration pattern in which the output of one language model call is used as the input for subsequent calls, enabling complex multi-step workflows that exceed the capabilities of a single prompt. By decomposing a large task into a sequence of smaller, well-defined subtasks, prompt chaining improves reliability, debuggability, and output quality while allowing each step to be independently optimized and validated. This technique forms the backbone of many agent frameworks and is widely used in production LLM applications for tasks ranging from document analysis to code generation pipelines. Research from 2024-2025 shows prompt chaining can improve accuracy by up to 15.6% over monolithic single-prompt approaches.
Several distinct chaining patterns have emerged, each suited to different task structures:
Sequential Chaining: The simplest and most common pattern. Prompts execute in a fixed linear order, with each output feeding directly into the next input. Ideal for ordered transformations like extract-then-summarize-then-translate pipelines. Each step has a focused, well-defined objective, making failures easy to diagnose and fix.
The following example demonstrates a sequential chain that summarizes text and then translates the summary:
# Sequential prompt chain: summarize then translate from openai import OpenAI client = OpenAI() def chain_step(system_prompt, user_input): response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_input}, ], ) return response.choices[0].message.content article = "Large language models have transformed AI by enabling..." # your text here # Step 1: Summarize summary = chain_step("Summarize this text in 2 sentences.", article) # Step 2: Translate the summary (output of step 1 feeds into step 2) translation = chain_step("Translate this English text to French.", summary) print(f"Summary: {summary}\nTranslation: {translation}")
Parallel Chaining: Multiple independent prompts run simultaneously on the same or different inputs, with results aggregated afterward. Useful for multi-perspective analysis (e.g., evaluating a document for factual accuracy, tone, and completeness in parallel) or processing multiple chunks of a large document concurrently.
Conditional Branching: Decision points route execution to different prompt chains based on the output of a prior step. For example, a classification step might route customer queries to different specialized chains (billing, technical support, general inquiry). By 2025, approximately 43% of organizations using LLM pipelines employ graph-based workflows with conditional routing.
Map-Reduce: A “map” step applies the same prompt to multiple data chunks independently (e.g., summarizing each section of a long document), then a “reduce” step synthesizes the results into a coherent whole. This pattern is essential for processing documents that exceed context limits and is heavily used in retrieval-augmented generation (RAG) pipelines.
Hierarchical Chains: Tasks are decomposed into nested subtask hierarchies. A high-level planner generates a task breakdown, and each subtask is handled by a specialized sub-chain. This mirrors how humans manage complex projects and is used in code generation pipelines where planning, implementation, and testing are separate hierarchical stages.
Feedback Loops: The output of a chain is evaluated (by another LLM call, a heuristic, or a human), and if it fails quality checks, the chain is re-executed with the feedback incorporated. This iterative refinement pattern is closely related to the Reflexion approach.
Robust prompt chains require validation gates between steps to catch and handle failures before they propagate:
Best practices for error handling include:
| Approach | Control Flow | Best For | Trade-offs |
| Single Prompt | One LLM call | Simple tasks within context limits | Limited complexity; no error recovery |
| Prompt Chaining | Predefined multi-step pipeline | Structured workflows with known steps | Predictable but inflexible; latency from multiple calls |
| Autonomous Agent (ReAct) | Dynamic LLM-driven loop | Open-ended tasks requiring adaptation | Flexible but harder to debug; unpredictable cost |
Prompt chaining occupies a middle ground: more capable than single prompts, more predictable and debuggable than fully autonomous agents. In production systems, chaining is preferred when the task structure is known in advance, while ReAct-style agents are used when the task requires dynamic tool selection and adaptive planning.
A common hybrid pattern uses prompt chaining for the overall workflow structure with ReAct-style agents at individual steps that require dynamic reasoning (e.g., a research step that needs to search and synthesize information adaptively).
Major frameworks supporting prompt chaining in 2025:
LangChain / LangGraph: Provides prompt templates, multi-step chains, agent handoffs, and graph-based workflow definitions. LangGraph extends LangChain with explicit state machines and conditional edges. Average production traces have grown to 7.7 steps as of 2024, reflecting the increasing complexity of deployed chains.
LlamaIndex: Focuses on RAG-oriented chaining with built-in support for sequential and parallel query pipelines, retrieval-then-synthesis chains, and router-based conditional dispatch across multiple indices.
DSPy: Takes a programmatic approach where chain steps are defined as modules with typed signatures. DSPy can automatically optimize prompt templates and few-shot examples for each step in the chain, treating prompt engineering as a compilation problem rather than a manual task.
Anthropic Tool Use / OpenAI Function Calling: Structured tool-calling interfaces enable branching and parallel execution via JSON schemas, providing framework-agnostic building blocks for prompt chains.
Best practices for implementation: