This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| agent_design_patterns [2026/03/31 02:00] – test agent | agent_design_patterns [2026/03/31 02:04] (current) – Create comprehensive agent design patterns article agent | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | test content | + | ====== Agent Design Patterns ====== |
| + | |||
| + | Agent design patterns are reusable architectural solutions for building AI agents that reason, act, and collaborate effectively. Just as software engineering has its Gang of Four patterns, the emerging field of agentic AI has converged on a set of proven patterns that dramatically improve agent reliability, | ||
| + | |||
| + | The patterns below range from single-agent reasoning techniques to complex multi-agent orchestration architectures. Many patterns compose naturally — a [[multi_agent_systems|multi-agent system]] might use [[react_framework|ReAct]] for tool calling, [[planning|planning]] for task decomposition, | ||
| + | |||
| + | ===== Core Agentic Patterns ===== | ||
| + | |||
| + | These are the foundational patterns identified by Andrew Ng and widely adopted across the agent-building community((Andrew Ng, "Four AI Agent Strategies That Improve GPT-4 and GPT-3.5", | ||
| + | |||
| + | ==== Reflection ==== | ||
| + | |||
| + | Reflection is a pattern where an agent critiques its own output and iteratively improves it. The agent generates an initial response, then evaluates that response against quality criteria, identifies flaws, and produces an improved version. This self-critique loop can run for a fixed number of iterations or until a quality threshold is met. In empirical tests, adding reflection to GPT-4 improved HumanEval coding benchmark scores from 67% to 88%((Shinn et al., " | ||
| + | |||
| + | ==== Tool Use / ReAct ==== | ||
| + | |||
| + | The Tool Use pattern enables agents to call external tools — APIs, databases, code interpreters, | ||
| + | |||
| + | ==== Planning ==== | ||
| + | |||
| + | Planning is the pattern of decomposing a complex task into a sequence of smaller, manageable subtasks before execution begins. The agent analyzes the overall goal, identifies dependencies between subtasks, determines an execution order, and then works through the plan step by step. Planning separates the "what to do" from the "how to do it," enabling agents to tackle problems that would be too complex to solve in a single pass((Wang et al., " | ||
| + | |||
| + | ==== Multi-Agent Collaboration ==== | ||
| + | |||
| + | Multi-Agent Collaboration involves multiple specialized agents working together to accomplish a goal that no single agent could handle alone. Each agent has a defined role, expertise, or capability, and they coordinate through message passing, shared state, or a supervisor. This mirrors how human organizations divide labor among specialists. Use multi-agent patterns when the problem domain is too broad for one agent, when different subtasks require different tool sets or prompts, or when you need debate and verification between independent perspectives. See [[multi_agent_systems|Multi-Agent Systems]] for architectures and frameworks. | ||
| + | |||
| + | ==== Human-in-the-Loop ==== | ||
| + | |||
| + | Human-in-the-Loop (HITL) is the pattern of incorporating human oversight, approval gates, or feedback injection into the agent' | ||
| + | |||
| + | ===== Reasoning Patterns ===== | ||
| + | |||
| + | Reasoning patterns structure how an agent thinks through a problem before producing an answer. They operate at the cognitive level, shaping the internal reasoning process. | ||
| + | |||
| + | ==== Chain of Thought (CoT) ==== | ||
| + | |||
| + | Chain of Thought prompts the model to produce intermediate reasoning steps before arriving at a final answer. By making the reasoning process explicit, CoT dramatically improves performance on math, logic, and multi-step problems. Use CoT whenever the task requires multi-step reasoning or when you need to audit how the agent reached its conclusion. See [[chain_of_thought|Chain of Thought]]. | ||
| + | |||
| + | ==== Tree of Thoughts (ToT) ==== | ||
| + | |||
| + | Tree of Thoughts extends CoT by exploring multiple reasoning paths simultaneously, | ||
| + | |||
| + | ==== Graph of Thoughts (GoT) ==== | ||
| + | |||
| + | Graph of Thoughts generalizes Tree of Thoughts by allowing reasoning paths to merge, split, and form arbitrary graph structures. Partial solutions from different branches can be combined, enabling more sophisticated reasoning than strictly tree-shaped exploration((Besta et al., "Graph of Thoughts: Solving Elaborate Problems with Large Language Models", | ||
| + | |||
| + | ==== Chain of Draft (CoD) ==== | ||
| + | |||
| + | Chain of Draft is an efficiency-oriented variant of CoT where the agent produces minimal, abbreviated reasoning steps rather than verbose explanations. Each intermediate step contains only the essential information needed to advance the reasoning. This preserves the accuracy benefits of CoT while significantly reducing token usage and latency. Use CoD when you need CoT-level reasoning quality but are constrained by cost or speed. See [[chain_of_draft|Chain of Draft]]. | ||
| + | |||
| + | ==== Self-Consistency ==== | ||
| + | |||
| + | Self-Consistency generates multiple independent reasoning chains for the same problem and selects the most common answer through majority voting. By sampling diverse reasoning paths, this pattern reduces the chance that a single flawed chain of thought produces an incorrect answer((Wang et al., " | ||
| + | |||
| + | ==== ReAct ==== | ||
| + | |||
| + | ReAct interleaves reasoning traces with action execution, creating a tight loop of thought-action-observation. Unlike pure reasoning patterns, ReAct grounds each reasoning step in real observations from tool use or environment interaction. This prevents the hallucination and reasoning drift that can occur in purely internal reasoning chains. Use ReAct as the default pattern for agents that need to interact with external tools or environments. See [[react_framework|ReAct Framework]]. | ||
| + | |||
| + | ==== Reflexion ==== | ||
| + | |||
| + | Reflexion adds a verbal self-reflection step after task completion, where the agent analyzes what went wrong (or right) and stores these reflections in memory for future attempts. Unlike simple reflection, Reflexion maintains an episodic memory of past failures and successes that persists across task attempts. Use Reflexion for iterative improvement on recurring task types. See [[reflexion|Reflexion]]. | ||
| + | |||
| + | ==== Self-Refine ==== | ||
| + | |||
| + | Self-Refine is a single-agent iterative refinement loop: generate, get feedback, refine. The same model both produces output and critiques it, using structured feedback to guide each revision. Unlike Reflexion, Self-Refine operates within a single task attempt rather than across attempts((Madaan et al., " | ||
| + | |||
| + | ===== Orchestration Patterns ===== | ||
| + | |||
| + | Orchestration patterns define how multiple agents or processing stages are coordinated to accomplish complex workflows. | ||
| + | |||
| + | ==== Supervisor / Manager ==== | ||
| + | |||
| + | A central supervisor agent receives tasks, delegates them to specialized worker agents, collects results, and synthesizes a final output. The supervisor maintains the overall plan and decides which worker to invoke at each step. This pattern provides clear control flow and is easy to reason about, but the supervisor can become a bottleneck. Use it when you need centralized coordination and a clear chain of command. See [[supervisor_pattern|Supervisor Pattern]]. | ||
| + | |||
| + | ==== Peer-to-Peer / Swarm ==== | ||
| + | |||
| + | In the swarm pattern, agents operate as equals without a central coordinator. Each agent can communicate with any other agent, and control flows dynamically based on the conversation state. This produces emergent, flexible behavior but can be harder to debug and predict. Use swarm architectures for exploratory tasks or when no single agent has sufficient context to coordinate the others. See [[swarm_pattern|Swarm Pattern]]. | ||
| + | |||
| + | ==== Hierarchical Delegation ==== | ||
| + | |||
| + | Hierarchical delegation extends the supervisor pattern into multiple levels: a top-level manager delegates to mid-level supervisors, | ||
| + | |||
| + | ==== Pipeline / Sequential ==== | ||
| + | |||
| + | The pipeline pattern chains agents in a fixed sequence, where each agent' | ||
| + | |||
| + | ==== Map-Reduce ==== | ||
| + | |||
| + | Map-Reduce distributes independent subtasks across multiple agents in parallel (map phase), then aggregates their results into a final output (reduce phase). This pattern excels at processing large datasets or document collections where each item can be analyzed independently. Use Map-Reduce when subtasks are independent and parallelizable. See [[map_reduce_pattern|Map-Reduce Pattern]]. | ||
| + | |||
| + | ==== Router / Semantic Routing ==== | ||
| + | |||
| + | A router agent analyzes incoming requests and directs them to the most appropriate specialized agent or pipeline based on the request' | ||
| + | |||
| + | ===== Memory Patterns ===== | ||
| + | |||
| + | Memory patterns define how agents store and retrieve information across and within interactions. | ||
| + | |||
| + | ==== Short-Term Memory (Context Window) ==== | ||
| + | |||
| + | Short-term memory is the agent' | ||
| + | |||
| + | ==== Long-Term Memory (Vector Store) ==== | ||
| + | |||
| + | Long-term memory persists information across conversations using external storage, typically a vector database. The agent embeds information into vector representations and retrieves relevant memories via semantic similarity search. This enables agents to accumulate knowledge over time, remember user preferences, | ||
| + | |||
| + | ==== Episodic Memory ==== | ||
| + | |||
| + | Episodic memory stores records of specific past experiences — complete interactions, | ||
| + | |||
| + | ==== Working Memory / Scratchpad ==== | ||
| + | |||
| + | Working memory provides the agent with an explicit scratchpad for storing intermediate results, partial computations, | ||
| + | |||
| + | ===== Communication Patterns ===== | ||
| + | |||
| + | Communication patterns define how information flows between agents and humans in an agentic system. | ||
| + | |||
| + | ==== Human-in-the-Loop ==== | ||
| + | |||
| + | Human-in-the-loop communication establishes structured interaction points where the agent requests human input, confirmation, | ||
| + | |||
| + | ==== Agent-to-Agent Messaging ==== | ||
| + | |||
| + | Agent-to-agent messaging enables direct communication between agents through structured message protocols. Messages can carry task assignments, | ||
| + | |||
| + | ==== Shared Blackboard ==== | ||
| + | |||
| + | The shared blackboard pattern provides a common knowledge store that all agents can read from and write to. Agents post partial results, observations, | ||
| + | |||
| + | ==== Event-Driven ==== | ||
| + | |||
| + | Event-driven communication uses an event bus or message queue to decouple agent interactions. Agents publish events when they complete actions or detect relevant conditions, and other agents subscribe to events they care about. This pattern enables loose coupling, scalability, | ||
| + | |||
| + | ===== Reliability Patterns ===== | ||
| + | |||
| + | Reliability patterns ensure agent systems behave predictably and recover gracefully from failures. | ||
| + | |||
| + | ==== Retry with Backoff ==== | ||
| + | |||
| + | When an LLM call, tool invocation, or API request fails, the agent retries with exponentially increasing delays between attempts. This handles transient failures — rate limits, network blips, temporary service outages — without overwhelming the failing service. Use retry with backoff as a baseline reliability pattern for all external calls. See [[retry_patterns|Retry Patterns]]. | ||
| + | |||
| + | ==== Fallback Chains ==== | ||
| + | |||
| + | Fallback chains define a prioritized list of alternative strategies to try when the primary approach fails. For example, if GPT-4 is unavailable, | ||
| + | |||
| + | ==== Circuit Breaker ==== | ||
| + | |||
| + | The circuit breaker pattern monitors failure rates for external services and temporarily stops calling a service that is consistently failing. After a cooldown period, the circuit breaker allows a test request through to see if the service has recovered. This prevents cascading failures and wasted resources on calls that are unlikely to succeed. Use circuit breakers when your agent depends on external services with variable reliability. See [[circuit_breaker|Circuit Breaker]]. | ||
| + | |||
| + | ==== Guardrails / Validation ==== | ||
| + | |||
| + | Guardrails enforce constraints on agent inputs and outputs through validation layers. Input guardrails filter or reject harmful, off-topic, or malformed requests. Output guardrails validate that agent responses meet format requirements, | ||
| + | |||
| + | ==== Dual LLM (Planner + Executor) ==== | ||
| + | |||
| + | The dual LLM pattern separates planning from execution using two different models or prompts. A capable, expensive model handles high-level planning and decision-making, | ||
| + | |||
| + | ===== Efficiency Patterns ===== | ||
| + | |||
| + | Efficiency patterns reduce the cost, latency, and resource consumption of agent systems. | ||
| + | |||
| + | ==== Caching (Semantic + Exact) ==== | ||
| + | |||
| + | Caching stores the results of previous LLM calls or tool invocations for reuse. Exact caching returns stored results when the input matches precisely. Semantic caching uses embedding similarity to return cached results for inputs that are semantically equivalent but not identical. Caching can dramatically reduce costs and latency for repetitive queries. Use caching when the agent handles recurring or similar requests. See [[agent_caching|Agent Caching]]. | ||
| + | |||
| + | ==== Speculative Execution ==== | ||
| + | |||
| + | Speculative execution runs multiple possible next steps in parallel before knowing which one will actually be needed. When the decision point is reached, the pre-computed result for the chosen path is available immediately. This trades compute cost for reduced latency. Use speculative execution when latency is critical and the set of possible next steps is small and predictable. See [[speculative_execution|Speculative Execution]]. | ||
| + | |||
| + | ==== Budget-Aware Reasoning ==== | ||
| + | |||
| + | Budget-aware reasoning constrains the agent' | ||
| + | |||
| + | ==== Parallel Tool Calling ==== | ||
| + | |||
| + | Parallel tool calling executes multiple independent tool invocations simultaneously rather than sequentially. When the agent identifies that several pieces of information are needed and the requests are independent, | ||
| + | |||
| + | ===== See Also ===== | ||
| + | |||
| + | * [[react_framework|ReAct Framework]] | ||
| + | * [[multi_agent_systems|Multi-Agent Systems]] | ||
| + | * [[planning|Planning]] | ||
| + | * [[tool_use|Tool Use]] | ||
| + | * [[human_in_the_loop|Human-in-the-Loop]] | ||
| + | * [[chain_of_thought|Chain of Thought]] | ||
| + | * [[critic_self_correction|Critic & Self-Correction]] | ||
| + | * [[guardrails|Guardrails & Validation]] | ||
| + | * [[agentic_ai|Agentic AI Overview]] | ||
| + | |||
| + | ===== References ===== | ||