This is an old revision of the document!

AI Agent Design Patterns

AI Agent Design Patterns are reusable architectural solutions for building autonomous AI systems that perceive, reason, plan, act, and learn. As the field has matured from simple prompt-response interactions to goal-directed agentic systems, a rich taxonomy of patterns has emerged across reasoning, orchestration, memory, communication, reliability, and efficiency concerns.

This page serves as the definitive index of agent design patterns. Each pattern includes a brief description, guidance on when to use it, and the key frameworks that implement it.

“I think AI agent workflows will drive massive AI progress this year – perhaps even more than the next generation of foundation models.” – Andrew Ng, 2024 ¹⁾

Core Agentic Patterns (Andrew Ng)

In early 2024, Andrew Ng identified four foundational agentic design patterns that enable LLM-based agents to dramatically outperform zero-shot prompting – in some cases allowing GPT-3.5 to surpass GPT-4 on coding benchmarks. ²⁾

Reflection

An agent critiques its own output and uses that feedback to iteratively improve. The LLM generates a response, evaluates it for weaknesses (factual errors, missing detail, logical flaws), then revises – repeating until a quality threshold is met.

When to use: Writing, code generation, analysis tasks where iterative refinement yields measurably better output.
Frameworks: LangGraph (Evaluator-Optimizer workflow), Reflexion, AutoGen self-critique loops.
See also: Reflexion

Tool Use / ReAct

The agent dynamically selects and invokes external tools (APIs, databases, search engines, code interpreters) during its reasoning loop. Rather than relying solely on parametric knowledge, the agent extends its capabilities through function calling.

When to use: Any task requiring access to current data, computation, or external systems.
Frameworks: LangChain tools, LlamaIndex, OpenAI function calling, Anthropic tool use, Gemini ADK.
See also: ReAct Framework

Planning

The agent decomposes a complex goal into a sequence of smaller, manageable sub-tasks before execution. Planning can be upfront (decomposition-first for well-defined problems) or interleaved (plan-act-reflect-repeat for uncertain environments).

When to use: Multi-step tasks, research workflows, software development, any problem requiring coordinated sub-goals.
Frameworks: LangGraph Plan-and-Execute, LlamaIndex query planning, AutoGen task decomposition.
See also: Planning

Multi-Agent Collaboration

Multiple specialized agents work together, splitting tasks and coordinating results. Each agent has a defined role (researcher, coder, reviewer) and agents may debate, delegate, or vote to reach superior solutions.

When to use: Complex workflows exceeding a single agent's context or capability, tasks benefiting from diverse perspectives.
Frameworks: CrewAI, AutoGen, LangGraph multi-agent, OpenAI Swarm, Camel.
See also: Multi-Agent Systems

Reasoning Patterns

Reasoning patterns determine how an agent thinks – whether it commits to a single chain of logic, explores branching alternatives, or iterates on its own reasoning.

Chain of Thought (CoT)

The foundational reasoning pattern. The model generates explicit intermediate reasoning steps before producing a final answer, either via few-shot examples or the zero-shot prompt “Let's think step by step.” ³⁾

When to use: Arithmetic, commonsense reasoning, any task where step-by-step reasoning reduces errors.
Frameworks: Native in all major LLMs; explicitly supported in LangChain, Gemini ADK.

Tree of Thoughts (ToT)

Extends CoT by exploring multiple reasoning branches in parallel, evaluating each path, and backtracking when needed. The model maintains a tree of partial solutions and uses search algorithms (BFS/DFS) to find the best path. ⁴⁾

When to use: Creative problem-solving, puzzle-solving, tasks with multiple valid approaches where exploration improves outcomes.
Frameworks: LangChain ToT, custom implementations with LangGraph.

Graph of Thoughts (GoT)

Generalizes ToT by modeling reasoning as a directed graph rather than a tree. Thought nodes can be combined, refined, or aggregated from multiple branches, enabling non-linear reasoning flows. ⁵⁾

When to use: Complex problems where insights from different reasoning paths should be merged (e.g., document summarization, multi-constraint optimization).
Frameworks: Research implementations; emerging LangGraph support.

Chain of Draft

A token-efficient variant of CoT where the model generates minimal, concise intermediate steps rather than verbose reasoning traces. Achieves comparable accuracy to CoT while using significantly fewer tokens. ⁶⁾

When to use: Cost-sensitive or latency-sensitive applications where full CoT is too expensive.
Frameworks: Prompt-level technique applicable to any LLM.

Self-Consistency

Samples multiple independent reasoning paths for the same problem, then selects the most common final answer by majority vote. Reduces variance and improves reliability over single-path CoT. ⁷⁾

When to use: High-stakes decisions where a single reasoning trace may be unreliable; math and logic problems.
Frameworks: Prompt-level technique; supported in LangChain via parallel chain execution.

ReAct (Reason + Act)

Interleaves reasoning traces with tool actions and observations in a structured loop: Thought → Action → Observation → repeat. The critical implementation detail is the stop sequence – generation halts at “Observation:” so the runtime provides real data rather than the model hallucinating it. ⁸⁾

When to use: The default agent pattern for approximately 90% of production agents. Any task combining reasoning with external tool access.
Frameworks: LangChain, LangGraph, LlamaIndex, Gemini ADK, Anthropic Claude agents.
See also: ReAct Framework

Reflexion

Extends ReAct with an explicit self-reflection step. After task completion (or failure), the agent generates a verbal critique of its performance and stores it in memory for future attempts, enabling learning across episodes. ⁹⁾

When to use: Multi-attempt tasks like coding challenges, where learning from mistakes significantly improves success rate.
Frameworks: LangGraph, custom implementations.
See also: Reflexion

Self-Refine

A single-model iterative refinement loop: the LLM generates output, critiques it, then refines it – without external feedback. Unlike Reflection with separate evaluator models, Self-Refine uses the same model for all three steps. ¹⁰⁾

When to use: Tasks where iterative polishing improves quality (writing, code review, translation) and a single model suffices.
Frameworks: Prompt-level technique; easily implemented in any orchestration framework.

Orchestration Patterns

Orchestration patterns determine how multiple agents or components coordinate to accomplish complex workflows. These patterns mirror distributed systems architectures. ¹¹⁾

Supervisor / Manager Pattern

A central supervisor agent receives user requests, decomposes them into sub-tasks, delegates to specialized worker agents, reviews results, and synthesizes a final response. The supervisor has delegation tools; workers have domain-specific tools. ¹²⁾

When to use: Production systems requiring deterministic control, quality review of worker outputs, and clear routing logic.
Frameworks: LangGraph (supervisor mode), AutoGen, CrewAI (manager agent).

Peer-to-Peer / Swarm

Agents communicate directly with each other without a central coordinator. Each agent decides independently when to hand off work, creating emergent collaborative behavior. Offers high flexibility but harder to debug and bound costs. ¹³⁾

When to use: Creative/exploratory tasks, brainstorming, tasks where rigid structure would limit emergent solutions.
Frameworks: OpenAI Swarm, AutoGen group chat, Camel.

Hierarchical Delegation

A multi-level tree of supervisor agents, where top-level supervisors delegate to mid-level supervisors who further delegate to worker agents. Enables scaling to large agent organizations while maintaining control at each level.

When to use: Enterprise-scale workflows with clearly defined organizational structure (e.g., research then analysis then writing then review).
Frameworks: LangGraph (nested subgraphs), AutoGen hierarchical teams, CrewAI hierarchical process.

Pipeline / Sequential

Agents process tasks in a fixed linear order, with each agent's output feeding directly into the next. The simplest multi-agent pattern – an “assembly line” for deterministic workflows. ¹⁴⁾

When to use: Data ETL, document processing pipelines, any workflow with clear sequential dependencies.
Frameworks: LangChain LCEL, LangGraph, Haystack pipelines, CrewAI sequential process.

Map-Reduce for Agents

A coordinator fans out identical or varied sub-tasks to multiple agents in parallel (map phase), then aggregates their results into a unified output (reduce phase). Dramatically reduces latency for parallelizable workloads.

When to use: Bulk document analysis, parallel research across multiple sources, any task with independent sub-problems.
Frameworks: LangGraph (fan-out/fan-in), Anthropic orchestrator-workers pattern, CrewAI.

Router Pattern (Semantic Routing)

An LLM-based router classifies incoming requests and directs them to the most appropriate specialized agent or workflow. Acts as an intelligent dispatcher using semantic understanding rather than rule-based routing.

When to use: Systems handling diverse request types that require different capabilities or toolsets.
Frameworks: LangGraph conditional edges, Anthropic router workflow, Semantic Router library.

Memory Patterns

Memory patterns determine what an agent remembers and how – from immediate conversational context to persistent cross-session knowledge. Andrew Ng has called memory engineering “one of the most debated topics in AI right now.” ¹⁵⁾

Short-Term Memory (Context Window)

The most basic memory: the LLM's context window holds recent conversation turns, system instructions, and immediate task context. Volatile and limited by model token constraints, managed via truncation, summarization, or sliding window strategies.

When to use: Always present; the foundation all other memory types build upon.
Frameworks: Native in all LLMs; LangChain ConversationBufferMemory, LlamaIndex chat history.

Long-Term Memory (Vector Store)

Persistent storage of facts, preferences, and knowledge across sessions using vector embeddings for semantic retrieval. The agent can store and recall information that persists indefinitely, enabling learning over time.

When to use: Personalization, knowledge accumulation, cross-session continuity, RAG-enhanced agents.
Frameworks: Mem0, Zep, Letta (MemGPT), Cognee, Qdrant, Chroma, Pinecone, pgvector. ¹⁶⁾

Episodic Memory

Records specific past events, interactions, and outcomes as discrete episodes with temporal metadata. Unlike semantic long-term memory which stores facts, episodic memory preserves the narrative context – what happened, when, and what the outcome was. ¹⁷⁾

When to use: Agents that need to learn from past successes/failures, avoid repeating mistakes, or reference prior interactions.
Frameworks: Mem0, Zep/Graphiti (temporal knowledge graphs), LangGraph checkpointing.

Working Memory / Scratchpad

A temporary workspace for intermediate reasoning artifacts – draft plans, partial results, hypotheses being tested – separate from the main conversation context. Analogous to human working memory for active problem-solving. ¹⁸⁾

When to use: Complex multi-step reasoning, plan-and-execute workflows, any task requiring intermediate state management.
Frameworks: LangGraph state management, Redis-backed scratchpads, MemGPT/Letta virtual context.

Communication Patterns

Communication patterns govern how agents interact with humans and each other during task execution.

Human-in-the-Loop (HITL)

The agent handles routine operations autonomously but escalates edge cases, high-stakes decisions, or low-confidence outputs to a human for review and approval. The level of autonomy can be tuned from full human oversight to sparse supervision.

When to use: Production systems in regulated industries, high-consequence decisions, building user trust during early deployment.
Frameworks: LangGraph interrupt/approval nodes, CrewAI human input tool, Anthropic tool use with confirmation.

Agent-to-Agent Messaging

Agents communicate through structured message protocols, enabling interoperability across different frameworks and organizations. Standardized protocols are replacing ad-hoc integration.

When to use: Cross-framework agent collaboration, enterprise systems with agents from different vendors.
Frameworks: Google A2A (Agent-to-Agent protocol), Anthropic MCP (Model Context Protocol), FIPA-ACL.

Shared Blackboard

All agents read from and write to a shared knowledge store (the “blackboard”). Agents independently contribute partial solutions, and any agent can build upon another's contributions. Decoupled communication – agents don't need to know about each other.

When to use: Problems requiring incremental, collaborative knowledge building where agents have diverse specializations.
Frameworks: Custom implementations with shared vector stores, Redis pub/sub, LangGraph shared state.

Event-Driven

Agents subscribe to event streams and react to relevant events asynchronously. Enables loose coupling – agents can be added or removed without modifying the overall system.

When to use: Real-time monitoring, workflow automation triggered by external events, scalable microservice-style agent systems.
Frameworks: Apache Kafka + agent consumers, AWS EventBridge, custom event buses with LangGraph.

Reliability Patterns

Reliability patterns ensure agents behave predictably and recover gracefully from the inevitable failures in production systems. ¹⁹⁾

Retry with Backoff

When a tool call or LLM request fails, the agent retries with exponentially increasing delays. Handles transient network errors, rate limits, and temporary service outages without manual intervention.

When to use: Any agent making external API calls or LLM requests in production.
Frameworks: LangGraph stateful retries, tenacity (Python), built-in to most LLM SDK clients.

Fallback Chains

Defines a cascade of backup strategies when the primary approach fails: try a simpler prompt, fall back to a different model, degrade gracefully to a cached response, or escalate to human handoff.

When to use: High-availability systems where agent failure must not block the user.
Frameworks: LangChain fallback chains, LangGraph conditional routing on failure, LiteLLM model fallbacks.

Circuit Breaker

After repeated failures of a particular tool or service, the agent stops calling it for a cooldown period rather than continuing to fail. Prevents cascade failures and wasted resources. Borrowed directly from microservice architecture. ²⁰⁾

When to use: Agents depending on unreliable external services; production systems with strict latency budgets.
Frameworks: Custom middleware; emerging support in LangGraph and agent gateway layers.

Guardrails / Output Validation

Structured checks on agent inputs and outputs: schema validation, toxicity filtering, factual consistency checks, PII detection, and policy compliance. Acts as a safety net between the agent's reasoning and its final output.

When to use: Every production agent system. Non-negotiable for customer-facing applications.
Frameworks: Guardrails AI, NeMo Guardrails (NVIDIA), LlamaGuard, Anthropic constitutional AI, custom validators.

Dual LLM (Planner + Executor)

Two models serve different roles: a stronger/larger model handles planning and critical decisions, while a smaller/faster/cheaper model handles routine execution. Optimizes the cost-quality tradeoff across the agent workflow.

When to use: Cost-sensitive production systems, high-throughput pipelines where not every step requires frontier-model reasoning.
Frameworks: LangGraph (heterogeneous model assignment), LiteLLM routing, custom orchestration.

Efficiency Patterns

Efficiency patterns optimize cost, latency, and resource usage – critical concerns as agents scale from prototypes to production.

Caching (Semantic + Exact)

Exact caching stores responses keyed by identical inputs. Semantic caching uses embedding similarity to reuse responses for semantically equivalent (but differently worded) queries. Both reduce redundant LLM calls, cutting cost and latency.

When to use: High-traffic agents with repetitive queries; cost optimization for expensive models.
Frameworks: Redis (semantic search), GPTCache, Mem0 relevance-scored caching, LangChain cache backends.

Speculative Execution

The agent generates multiple candidate responses or tool call paths in parallel, verifying and selecting the fastest valid result. Trades compute for latency – particularly effective when validation is cheap relative to generation.

When to use: Latency-critical applications; tasks where multiple valid approaches exist and the fastest matters.
Frameworks: LangGraph parallel branches, custom implementations, speculative decoding in LLM inference engines (vLLM).

Budget-Aware Reasoning

The agent monitors its token usage, API costs, or wall-clock time and adjusts its reasoning depth accordingly. May use cheaper models for simple sub-tasks, skip optional refinement steps, or invoke early stopping when confident.

When to use: Production systems with cost budgets, metered API access, or strict latency SLAs.
Frameworks: LangGraph step budgets, Mem0 relevance scoring and forgetting, custom token tracking middleware.

Parallel Tool Calling

The agent invokes multiple independent tools simultaneously rather than sequentially. Requires the orchestration layer to detect independence between tool calls and dispatch them concurrently.

When to use: Any agent workflow with independent tool calls (e.g., searching multiple sources, fetching data from multiple APIs).
Frameworks: OpenAI parallel function calling, Anthropic parallel tool use, LangGraph fan-out, AutoGen concurrent execution.

Pattern Selection Guide

Concern	Start With	Scale To
Single-agent reasoning	CoT / ReAct	ToT / Reflexion / Self-Consistency
Multi-step tasks	Planning + Tool Use	Map-Reduce / Hierarchical Delegation
Multi-agent coordination	Supervisor	Hierarchical / Swarm (depending on control needs)
Memory	Short-term + RAG	Episodic + Working Memory + Knowledge Graphs
Reliability	Guardrails + Retry	Circuit Breaker + Fallback Chains + Dual LLM
Efficiency	Caching	Budget-Aware + Parallel Tool Calling + Speculative