Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
AI Agent Design Patterns are reusable architectural solutions for building autonomous AI systems that perceive, reason, plan, act, and learn. As the field has matured from simple prompt-response interactions to goal-directed agentic systems, a rich taxonomy of patterns has emerged across reasoning, orchestration, memory, communication, reliability, and efficiency concerns.
This page serves as the definitive index of agent design patterns. Each pattern includes a brief description, guidance on when to use it, and the key frameworks that implement it.
“I think AI agent workflows will drive massive AI progress this year – perhaps even more than the next generation of foundation models.” – Andrew Ng, 2024 1)
In early 2024, Andrew Ng identified four foundational agentic design patterns that enable LLM-based agents to dramatically outperform zero-shot prompting – in some cases allowing GPT-3.5 to surpass GPT-4 on coding benchmarks. 2)
An agent critiques its own output and uses that feedback to iteratively improve. The LLM generates a response, evaluates it for weaknesses (factual errors, missing detail, logical flaws), then revises – repeating until a quality threshold is met.
The agent dynamically selects and invokes external tools (APIs, databases, search engines, code interpreters) during its reasoning loop. Rather than relying solely on parametric knowledge, the agent extends its capabilities through function calling.
The agent decomposes a complex goal into a sequence of smaller, manageable sub-tasks before execution. Planning can be upfront (decomposition-first for well-defined problems) or interleaved (plan-act-reflect-repeat for uncertain environments).
Multiple specialized agents work together, splitting tasks and coordinating results. Each agent has a defined role (researcher, coder, reviewer) and agents may debate, delegate, or vote to reach superior solutions.
Reasoning patterns determine how an agent thinks – whether it commits to a single chain of logic, explores branching alternatives, or iterates on its own reasoning.
The foundational reasoning pattern. The model generates explicit intermediate reasoning steps before producing a final answer, either via few-shot examples or the zero-shot prompt “Let's think step by step.” 3)
Extends CoT by exploring multiple reasoning branches in parallel, evaluating each path, and backtracking when needed. The model maintains a tree of partial solutions and uses search algorithms (BFS/DFS) to find the best path. 4)
Generalizes ToT by modeling reasoning as a directed graph rather than a tree. Thought nodes can be combined, refined, or aggregated from multiple branches, enabling non-linear reasoning flows. 5)
A token-efficient variant of CoT where the model generates minimal, concise intermediate steps rather than verbose reasoning traces. Achieves comparable accuracy to CoT while using significantly fewer tokens. 6)
Samples multiple independent reasoning paths for the same problem, then selects the most common final answer by majority vote. Reduces variance and improves reliability over single-path CoT. 7)
Interleaves reasoning traces with tool actions and observations in a structured loop: Thought → Action → Observation → repeat. The critical implementation detail is the stop sequence – generation halts at “Observation:” so the runtime provides real data rather than the model hallucinating it. 8)
Extends ReAct with an explicit self-reflection step. After task completion (or failure), the agent generates a verbal critique of its performance and stores it in memory for future attempts, enabling learning across episodes. 9)
A single-model iterative refinement loop: the LLM generates output, critiques it, then refines it – without external feedback. Unlike Reflection with separate evaluator models, Self-Refine uses the same model for all three steps. 10)
Orchestration patterns determine how multiple agents or components coordinate to accomplish complex workflows. These patterns mirror distributed systems architectures. 11)
A central supervisor agent receives user requests, decomposes them into sub-tasks, delegates to specialized worker agents, reviews results, and synthesizes a final response. The supervisor has delegation tools; workers have domain-specific tools. 12)
Agents communicate directly with each other without a central coordinator. Each agent decides independently when to hand off work, creating emergent collaborative behavior. Offers high flexibility but harder to debug and bound costs. 13)
A multi-level tree of supervisor agents, where top-level supervisors delegate to mid-level supervisors who further delegate to worker agents. Enables scaling to large agent organizations while maintaining control at each level.
Agents process tasks in a fixed linear order, with each agent's output feeding directly into the next. The simplest multi-agent pattern – an “assembly line” for deterministic workflows. 14)
A coordinator fans out identical or varied sub-tasks to multiple agents in parallel (map phase), then aggregates their results into a unified output (reduce phase). Dramatically reduces latency for parallelizable workloads.
An LLM-based router classifies incoming requests and directs them to the most appropriate specialized agent or workflow. Acts as an intelligent dispatcher using semantic understanding rather than rule-based routing.
Memory patterns determine what an agent remembers and how – from immediate conversational context to persistent cross-session knowledge. Andrew Ng has called memory engineering “one of the most debated topics in AI right now.” 15)
The most basic memory: the LLM's context window holds recent conversation turns, system instructions, and immediate task context. Volatile and limited by model token constraints, managed via truncation, summarization, or sliding window strategies.
Persistent storage of facts, preferences, and knowledge across sessions using vector embeddings for semantic retrieval. The agent can store and recall information that persists indefinitely, enabling learning over time.
Records specific past events, interactions, and outcomes as discrete episodes with temporal metadata. Unlike semantic long-term memory which stores facts, episodic memory preserves the narrative context – what happened, when, and what the outcome was. 17)
A temporary workspace for intermediate reasoning artifacts – draft plans, partial results, hypotheses being tested – separate from the main conversation context. Analogous to human working memory for active problem-solving. 18)
Communication patterns govern how agents interact with humans and each other during task execution.
The agent handles routine operations autonomously but escalates edge cases, high-stakes decisions, or low-confidence outputs to a human for review and approval. The level of autonomy can be tuned from full human oversight to sparse supervision.
Agents communicate through structured message protocols, enabling interoperability across different frameworks and organizations. Standardized protocols are replacing ad-hoc integration.
All agents read from and write to a shared knowledge store (the “blackboard”). Agents independently contribute partial solutions, and any agent can build upon another's contributions. Decoupled communication – agents don't need to know about each other.
Agents subscribe to event streams and react to relevant events asynchronously. Enables loose coupling – agents can be added or removed without modifying the overall system.
Reliability patterns ensure agents behave predictably and recover gracefully from the inevitable failures in production systems. 19)
When a tool call or LLM request fails, the agent retries with exponentially increasing delays. Handles transient network errors, rate limits, and temporary service outages without manual intervention.
Defines a cascade of backup strategies when the primary approach fails: try a simpler prompt, fall back to a different model, degrade gracefully to a cached response, or escalate to human handoff.
After repeated failures of a particular tool or service, the agent stops calling it for a cooldown period rather than continuing to fail. Prevents cascade failures and wasted resources. Borrowed directly from microservice architecture. 20)
Structured checks on agent inputs and outputs: schema validation, toxicity filtering, factual consistency checks, PII detection, and policy compliance. Acts as a safety net between the agent's reasoning and its final output.
Two models serve different roles: a stronger/larger model handles planning and critical decisions, while a smaller/faster/cheaper model handles routine execution. Optimizes the cost-quality tradeoff across the agent workflow.
Efficiency patterns optimize cost, latency, and resource usage – critical concerns as agents scale from prototypes to production.
Exact caching stores responses keyed by identical inputs. Semantic caching uses embedding similarity to reuse responses for semantically equivalent (but differently worded) queries. Both reduce redundant LLM calls, cutting cost and latency.
The agent generates multiple candidate responses or tool call paths in parallel, verifying and selecting the fastest valid result. Trades compute for latency – particularly effective when validation is cheap relative to generation.
The agent monitors its token usage, API costs, or wall-clock time and adjusts its reasoning depth accordingly. May use cheaper models for simple sub-tasks, skip optional refinement steps, or invoke early stopping when confident.
The agent invokes multiple independent tools simultaneously rather than sequentially. Requires the orchestration layer to detect independence between tool calls and dispatch them concurrently.
| Concern | Start With | Scale To |
|---|---|---|
| Single-agent reasoning | CoT / ReAct | ToT / Reflexion / Self-Consistency |
| Multi-step tasks | Planning + Tool Use | Map-Reduce / Hierarchical Delegation |
| Multi-agent coordination | Supervisor | Hierarchical / Swarm (depending on control needs) |
| Memory | Short-term + RAG | Episodic + Working Memory + Knowledge Graphs |
| Reliability | Guardrails + Retry | Circuit Breaker + Fallback Chains + Dual LLM |
| Efficiency | Caching | Budget-Aware + Parallel Tool Calling + Speculative |