Differences

This shows you the differences between two versions of the page.

--- agent_design_patterns [2026/03/31 02:02] – Comprehensive AI Agent Design Patterns index page agent
+++ agent_design_patterns [2026/03/31 02:04] (current) – Create comprehensive agent design patterns article agent
@@ Line 1: / Line 1: @@
-====== AI Agent Design Patterns ======
+====== Agent Design Patterns ======
-**AI Agent Design Patterns** are reusable architectural solutions for building autonomous AI systems that perceive, reason, plan, act, and learn. As the field has matured from simple prompt-response interactions to goal-directed agentic systems, a rich taxonomy of patterns has emerged across reasoning, orchestration, memory, communication, reliability, and efficiency concerns.
+Agent design patterns are reusable architectural solutions for building AI agents that reason, act, and collaborate effectively. Just as software engineering has its Gang of Four patterns, the emerging field of agentic AI has converged on a set of proven patterns that dramatically improve agent reliability, capability, and efficiency((Andrew Ng, "Agentic Design Patterns" talk, Sequoia Capital AI Ascent, March 2024 [[https://www.youtube.com/watch?v=sal78ACtGTc|YouTube]])). This article serves as a definitive index of all major agent design patterns, organized by category.
-This page serves as the **definitive index** of agent design patterns. Each pattern includes a brief description, guidance on when to use it, and the key frameworks that implement it.
+The patterns below range from single-agent reasoning techniques to complex multi-agent orchestration architectures. Many patterns compose naturally — a [[multi_agent_systems|multi-agent system]] might use [[react_framework|ReAct]] for tool calling, [[planning|planning]] for task decomposition, and [[human_in_the_loop|human-in-the-loop]] gates for safety-critical decisions.
-> "I think AI agent workflows will drive massive AI progress this year -- perhaps even more than the next generation of foundation models." -- **Andrew Ng**, 2024 ((source [[https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/|DeepLearning.AI -- How Agents Can Improve LLM Performance]]))
+===== Core Agentic Patterns =====
-----
+These are the foundational patterns identified by Andrew Ng and widely adopted across the agent-building community((Andrew Ng, "Four AI Agent Strategies That Improve GPT-4 and GPT-3.5", DeepLearning.AI, The Batch newsletter, 2024 [[https://www.deeplearning.ai/the-batch/|DeepLearning.AI]])). They represent the building blocks from which more complex agent architectures are composed.
-===== Core Agentic Patterns (Andrew Ng) =====
-In early 2024, Andrew Ng identified four foundational agentic design patterns that enable LLM-based agents to dramatically outperform zero-shot prompting -- in some cases allowing GPT-3.5 to surpass GPT-4 on coding benchmarks. ((source [[https://mlnotes.substack.com/p/4-agentic-design-patterns-and-4-key|MLNotes -- 4 Agentic Design Patterns and 4 Key Trends]]))
 ==== Reflection ====
-An agent critiques its own output and uses that feedback to iteratively improve. The LLM generates a response, evaluates it for weaknesses (factual errors, missing detail, logical flaws), then revises -- repeating until a quality threshold is met.
+Reflection is a pattern where an agent critiques its own output and iteratively improves it. The agent generates an initial response, then evaluates that response against quality criteria, identifies flaws, and produces an improved version. This self-critique loop can run for a fixed number of iterations or until a quality threshold is met. In empirical tests, adding reflection to GPT-4 improved HumanEval coding benchmark scores from 67% to 88%((Shinn et al., "Reflexion: Language Agents with Verbal Reinforcement Learning", NeurIPS 2023 [[https://arxiv.org/abs/2303.11366|arXiv]])). Use reflection when output quality matters more than latency, and when the task has verifiable quality criteria. See [[critic_self_correction|Critic & Self-Correction]] for detailed coverage.
-  * **When to use:** Writing, code generation, analysis tasks where iterative refinement yields measurably better output.
-  * **Frameworks:** LangGraph (Evaluator-Optimizer workflow), Reflexion, AutoGen self-critique loops.
-  * **See also:** [[reflexion|Reflexion]]
 ==== Tool Use / ReAct ====
-The agent dynamically selects and invokes external tools (APIs, databases, search engines, code interpreters) during its reasoning loop. Rather than relying solely on parametric knowledge, the agent extends its capabilities through function calling.
+The Tool Use pattern enables agents to call external tools — APIs, databases, code interpreters, search engines — within a structured reason-act loop. The agent reasons about what information or action is needed, selects and invokes the appropriate tool, observes the result, and continues reasoning. This grounds the agent in real-world data and extends its capabilities far beyond text generation alone((Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models", ICLR 2023 [[https://arxiv.org/abs/2210.03629|arXiv]])). Use this pattern whenever the agent needs access to external information or must take actions in the world. See [[react_framework|ReAct Framework]] and [[tool_use|Tool Use]] for implementation details.
-  * **When to use:** Any task requiring access to current data, computation, or external systems.
-  * **Frameworks:** LangChain tools, LlamaIndex, OpenAI function calling, Anthropic tool use, Gemini ADK.
-  * **See also:** [[react_framework|ReAct Framework]]
 ==== Planning ====
-The agent decomposes a complex goal into a sequence of smaller, manageable sub-tasks before execution. Planning can be upfront (decomposition-first for well-defined problems) or interleaved (plan-act-reflect-repeat for uncertain environments).
+Planning is the pattern of decomposing a complex task into a sequence of smaller, manageable subtasks before execution begins. The agent analyzes the overall goal, identifies dependencies between subtasks, determines an execution order, and then works through the plan step by step. Planning separates the "what to do" from the "how to do it," enabling agents to tackle problems that would be too complex to solve in a single pass((Wang et al., "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models", ACL 2023 [[https://arxiv.org/abs/2305.04091|arXiv]])). Use planning for multi-step tasks with dependencies or when the solution path is not immediately obvious. See [[planning|Planning]] and [[plan_and_execute_agents|Plan-and-Execute Agents]].
-  * **When to use:** Multi-step tasks, research workflows, software development, any problem requiring coordinated sub-goals.
-  * **Frameworks:** LangGraph Plan-and-Execute, LlamaIndex query planning, AutoGen task decomposition.
-  * **See also:** [[planning|Planning]]
 ==== Multi-Agent Collaboration ====
-Multiple specialized agents work together, splitting tasks and coordinating results. Each agent has a defined role (researcher, coder, reviewer) and agents may debate, delegate, or vote to reach superior solutions.
+Multi-Agent Collaboration involves multiple specialized agents working together to accomplish a goal that no single agent could handle alone. Each agent has a defined role, expertise, or capability, and they coordinate through message passing, shared state, or a supervisor. This mirrors how human organizations divide labor among specialists. Use multi-agent patterns when the problem domain is too broad for one agent, when different subtasks require different tool sets or prompts, or when you need debate and verification between independent perspectives. See [[multi_agent_systems|Multi-Agent Systems]] for architectures and frameworks.
-  * **When to use:** Complex workflows exceeding a single agent's context or capability, tasks benefiting from diverse perspectives.
+==== Human-in-the-Loop ====
-  * **Frameworks:** CrewAI, AutoGen, LangGraph multi-agent, OpenAI Swarm, Camel.
-  * **See also:** [[multi_agent_systems|Multi-Agent Systems]]
-----
+Human-in-the-Loop (HITL) is the pattern of incorporating human oversight, approval gates, or feedback injection into the agent's workflow. Rather than running fully autonomously, the agent pauses at defined checkpoints to request human review, confirmation of high-stakes actions, or corrective feedback. This pattern is essential for production deployments where errors carry real consequences((Shneiderman, "Human-Centered AI", Oxford University Press, 2022)). Use HITL for safety-critical decisions, actions with irreversible consequences, or when building trust during initial deployment. See [[human_in_the_loop|Human-in-the-Loop]].
 ===== Reasoning Patterns =====
-Reasoning patterns determine **how an agent thinks** -- whether it commits to a single chain of logic, explores branching alternatives, or iterates on its own reasoning.
+Reasoning patterns structure how an agent thinks through a problem before producing an answer. They operate at the cognitive level, shaping the internal reasoning process.
 ==== Chain of Thought (CoT) ====
-The foundational reasoning pattern. The model generates explicit intermediate reasoning steps before producing a final answer, either via few-shot examples or the zero-shot prompt "Let's think step by step." ((source [[https://arxiv.org/abs/2201.11903|Wei et al. 2022 -- Chain-of-Thought Prompting]]))
+Chain of Thought prompts the model to produce intermediate reasoning steps before arriving at a final answer. By making the reasoning process explicit, CoT dramatically improves performance on math, logic, and multi-step problems. Use CoT whenever the task requires multi-step reasoning or when you need to audit how the agent reached its conclusion. See [[chain_of_thought|Chain of Thought]].
-  * **When to use:** Arithmetic, commonsense reasoning, any task where step-by-step reasoning reduces errors.
-  * **Frameworks:** Native in all major LLMs; explicitly supported in LangChain, Gemini ADK.
 ==== Tree of Thoughts (ToT) ====
-Extends CoT by exploring multiple reasoning branches in parallel, evaluating each path, and backtracking when needed. The model maintains a tree of partial solutions and uses search algorithms (BFS/DFS) to find the best path. ((source [[https://arxiv.org/abs/2305.10601|Yao et al. 2023 -- Tree of Thoughts]]))
+Tree of Thoughts extends CoT by exploring multiple reasoning paths simultaneously, branching at decision points and evaluating which paths are most promising. The agent can backtrack from dead ends and explore alternatives, mimicking how humans consider multiple approaches to a problem((Yao et al., "Tree of Thoughts: Deliberate Problem Solving with Large Language Models", NeurIPS 2023 [[https://arxiv.org/abs/2305.10601|arXiv]])). Use ToT for problems with large solution spaces or where the first reasoning path may not be optimal. See [[tree_of_thoughts|Tree of Thoughts]].
-  * **When to use:** Creative problem-solving, puzzle-solving, tasks with multiple valid approaches where exploration improves outcomes.
-  * **Frameworks:** LangChain ToT, custom implementations with LangGraph.
 ==== Graph of Thoughts (GoT) ====
-Generalizes ToT by modeling reasoning as a directed graph rather than a tree. Thought nodes can be combined, refined, or aggregated from multiple branches, enabling non-linear reasoning flows. ((source [[https://arxiv.org/abs/2308.09687|Besta et al. 2023 -- Graph of Thoughts]]))
+Graph of Thoughts generalizes Tree of Thoughts by allowing reasoning paths to merge, split, and form arbitrary graph structures. Partial solutions from different branches can be combined, enabling more sophisticated reasoning than strictly tree-shaped exploration((Besta et al., "Graph of Thoughts: Solving Elaborate Problems with Large Language Models", AAAI 2024 [[https://arxiv.org/abs/2308.09687|arXiv]])). Use GoT for problems where partial solutions can be meaningfully combined. See [[graph_of_thoughts|Graph of Thoughts]].
-  * **When to use:** Complex problems where insights from different reasoning paths should be merged (e.g., document summarization, multi-constraint optimization).
-  * **Frameworks:** Research implementations; emerging LangGraph support.
-==== Chain of Draft ====
-A token-efficient variant of CoT where the model generates minimal, concise intermediate steps rather than verbose reasoning traces. Achieves comparable accuracy to CoT while using significantly fewer tokens. ((source [[https://arxiv.org/abs/2502.18600|Chain of Draft 2025]]))
+==== Chain of Draft (CoD) ====
-  * **When to use:** Cost-sensitive or latency-sensitive applications where full CoT is too expensive.
+Chain of Draft is an efficiency-oriented variant of CoT where the agent produces minimal, abbreviated reasoning steps rather than verbose explanations. Each intermediate step contains only the essential information needed to advance the reasoning. This preserves the accuracy benefits of CoT while significantly reducing token usage and latency. Use CoD when you need CoT-level reasoning quality but are constrained by cost or speed. See [[chain_of_draft|Chain of Draft]].
-  * **Frameworks:** Prompt-level technique applicable to any LLM.
 ==== Self-Consistency ====
-Samples multiple independent reasoning paths for the same problem, then selects the most common final answer by majority vote. Reduces variance and improves reliability over single-path CoT. ((source [[https://arxiv.org/abs/2203.11171|Wang et al. 2022 -- Self-Consistency]]))
+Self-Consistency generates multiple independent reasoning chains for the same problem and selects the most common answer through majority voting. By sampling diverse reasoning paths, this pattern reduces the chance that a single flawed chain of thought produces an incorrect answer((Wang et al., "Self-Consistency Improves Chain of Thought Reasoning in Language Models", ICLR 2023 [[https://arxiv.org/abs/2203.11171|arXiv]])). Use self-consistency when correctness is paramount and you can afford the additional compute cost. See [[self_consistency|Self-Consistency]].
-  * **When to use:** High-stakes decisions where a single reasoning trace may be unreliable; math and logic problems.
+==== ReAct ====
-  * **Frameworks:** Prompt-level technique; supported in LangChain via parallel chain execution.
-==== ReAct (Reason + Act) ====
+ReAct interleaves reasoning traces with action execution, creating a tight loop of thought-action-observation. Unlike pure reasoning patterns, ReAct grounds each reasoning step in real observations from tool use or environment interaction. This prevents the hallucination and reasoning drift that can occur in purely internal reasoning chains. Use ReAct as the default pattern for agents that need to interact with external tools or environments. See [[react_framework|ReAct Framework]].
-Interleaves reasoning traces with tool actions and observations in a structured loop: Thought -> Action -> Observation -> repeat. The critical implementation detail is the **stop sequence** -- generation halts at "Observation:" so the runtime provides real data rather than the model hallucinating it. ((source [[https://arxiv.org/abs/2210.03629|Yao et al. ICLR 2023 -- ReAct]]))
-  * **When to use:** The default agent pattern for approximately 90% of production agents. Any task combining reasoning with external tool access.
-  * **Frameworks:** LangChain, LangGraph, LlamaIndex, Gemini ADK, Anthropic Claude agents.
-  * **See also:** [[react_framework|ReAct Framework]]
 ==== Reflexion ====
-Extends ReAct with an explicit self-reflection step. After task completion (or failure), the agent generates a verbal critique of its performance and stores it in memory for future attempts, enabling learning across episodes. ((source [[https://arxiv.org/abs/2303.11366|Shinn et al. 2023 -- Reflexion]]))
+Reflexion adds a verbal self-reflection step after task completion, where the agent analyzes what went wrong (or right) and stores these reflections in memory for future attempts. Unlike simple reflection, Reflexion maintains an episodic memory of past failures and successes that persists across task attempts. Use Reflexion for iterative improvement on recurring task types. See [[reflexion|Reflexion]].
-  * **When to use:** Multi-attempt tasks like coding challenges, where learning from mistakes significantly improves success rate.
-  * **Frameworks:** LangGraph, custom implementations.
-  * **See also:** [[reflexion|Reflexion]]
 ==== Self-Refine ====
-A single-model iterative refinement loop: the LLM generates output, critiques it, then refines it -- without external feedback. Unlike Reflection with separate evaluator models, Self-Refine uses the same model for all three steps. ((source [[https://arxiv.org/abs/2303.17651|Madaan et al. 2023 -- Self-Refine]]))
+Self-Refine is a single-agent iterative refinement loop: generate, get feedback, refine. The same model both produces output and critiques it, using structured feedback to guide each revision. Unlike Reflexion, Self-Refine operates within a single task attempt rather than across attempts((Madaan et al., "Self-Refine: Iterative Refinement with Self-Feedback", NeurIPS 2023 [[https://arxiv.org/abs/2303.17651|arXiv]])). Use Self-Refine for tasks where quality improves measurably with iteration, such as code generation or creative writing. See [[self_refine|Self-Refine]].
-  * **When to use:** Tasks where iterative polishing improves quality (writing, code review, translation) and a single model suffices.
-  * **Frameworks:** Prompt-level technique; easily implemented in any orchestration framework.
-----
 ===== Orchestration Patterns =====
-Orchestration patterns determine **how multiple agents or components coordinate** to accomplish complex workflows. These patterns mirror distributed systems architectures. ((source [[https://gurusup.com/blog/agent-orchestration-patterns|GuruSup -- Agent Orchestration Patterns 2026]]))
+Orchestration patterns define how multiple agents or processing stages are coordinated to accomplish complex workflows.
-==== Supervisor / Manager Pattern ====
-A central supervisor agent receives user requests, decomposes them into sub-tasks, delegates to specialized worker agents, reviews results, and synthesizes a final response. The supervisor has delegation tools; workers have domain-specific tools. ((source [[https://tutorialq.com/ai/multi-agent/supervisor-and-hierarchical|TutorialQ -- Supervisor and Hierarchical Patterns]]))
+==== Supervisor / Manager ====
-  * **When to use:** Production systems requiring deterministic control, quality review of worker outputs, and clear routing logic.
+A central supervisor agent receives tasks, delegates them to specialized worker agents, collects results, and synthesizes a final output. The supervisor maintains the overall plan and decides which worker to invoke at each step. This pattern provides clear control flow and is easy to reason about, but the supervisor can become a bottleneck. Use it when you need centralized coordination and a clear chain of command. See [[supervisor_pattern|Supervisor Pattern]].
-  * **Frameworks:** LangGraph (supervisor mode), AutoGen, CrewAI (manager agent).
 ==== Peer-to-Peer / Swarm ====
-Agents communicate directly with each other without a central coordinator. Each agent decides independently when to hand off work, creating emergent collaborative behavior. Offers high flexibility but harder to debug and bound costs. ((source [[https://singhajit.com/multi-agent-ai-swarms-system-design/|Ajit Singh -- Architecting Multi-Agent AI Swarms]]))
+In the swarm pattern, agents operate as equals without a central coordinator. Each agent can communicate with any other agent, and control flows dynamically based on the conversation state. This produces emergent, flexible behavior but can be harder to debug and predict. Use swarm architectures for exploratory tasks or when no single agent has sufficient context to coordinate the others. See [[swarm_pattern|Swarm Pattern]].
-  * **When to use:** Creative/exploratory tasks, brainstorming, tasks where rigid structure would limit emergent solutions.
-  * **Frameworks:** OpenAI Swarm, AutoGen group chat, Camel.
 ==== Hierarchical Delegation ====
-A multi-level tree of supervisor agents, where top-level supervisors delegate to mid-level supervisors who further delegate to worker agents. Enables scaling to large agent organizations while maintaining control at each level.
+Hierarchical delegation extends the supervisor pattern into multiple levels: a top-level manager delegates to mid-level supervisors, who in turn delegate to specialized workers. This mirrors organizational hierarchies and scales to very complex tasks. Use hierarchical delegation when the problem naturally decomposes into domains and sub-domains. See [[hierarchical_delegation|Hierarchical Delegation]].
-  * **When to use:** Enterprise-scale workflows with clearly defined organizational structure (e.g., research then analysis then writing then review).
-  * **Frameworks:** LangGraph (nested subgraphs), AutoGen hierarchical teams, CrewAI hierarchical process.
 ==== Pipeline / Sequential ====
-Agents process tasks in a fixed linear order, with each agent's output feeding directly into the next. The simplest multi-agent pattern -- an "assembly line" for deterministic workflows. ((source [[https://www.linkedin.com/posts/genai-mindsoftomorrow_ai-agenticai-machinelearning-activity-7436389604649259008-5cFX|LinkedIn -- 5 Multi-Agent Patterns 2026]]))
+The pipeline pattern chains agents in a fixed sequence, where each agent's output becomes the next agent's input. Each stage performs a specific transformation or enrichment. Pipelines are simple, predictable, and easy to test. Use them when the task naturally decomposes into ordered stages with clear interfaces. See [[pipeline_pattern|Pipeline Pattern]].
-  * **When to use:** Data ETL, document processing pipelines, any workflow with clear sequential dependencies.
-  * **Frameworks:** LangChain LCEL, LangGraph, Haystack pipelines, CrewAI sequential process.
-==== Map-Reduce for Agents ====
-A coordinator fans out identical or varied sub-tasks to multiple agents in parallel (map phase), then aggregates their results into a unified output (reduce phase). Dramatically reduces latency for parallelizable workloads.
-  * **When to use:** Bulk document analysis, parallel research across multiple sources, any task with independent sub-problems.
-  * **Frameworks:** LangGraph (fan-out/fan-in), Anthropic orchestrator-workers pattern, CrewAI.
-==== Router Pattern (Semantic Routing) ====
+==== Map-Reduce ====
-An LLM-based router classifies incoming requests and directs them to the most appropriate specialized agent or workflow. Acts as an intelligent dispatcher using semantic understanding rather than rule-based routing.
+Map-Reduce distributes independent subtasks across multiple agents in parallel (map phase), then aggregates their results into a final output (reduce phase). This pattern excels at processing large datasets or document collections where each item can be analyzed independently. Use Map-Reduce when subtasks are independent and parallelizable. See [[map_reduce_pattern|Map-Reduce Pattern]].
-  * **When to use:** Systems handling diverse request types that require different capabilities or toolsets.
+==== Router / Semantic Routing ====
-  * **Frameworks:** LangGraph conditional edges, Anthropic router workflow, Semantic Router library.
-----
+A router agent analyzes incoming requests and directs them to the most appropriate specialized agent or pipeline based on the request's content, intent, or complexity. This avoids the overhead of engaging all agents for every request. Use routing when you have diverse request types that require different processing strategies. See [[router_pattern|Router Pattern]].
 ===== Memory Patterns =====
-Memory patterns determine **what an agent remembers and how** -- from immediate conversational context to persistent cross-session knowledge. Andrew Ng has called memory engineering "one of the most debated topics in AI right now." ((source [[https://blogs.oracle.com/developers/oracle-and-deeplearning-ai-launch-new-agent-memory-course-for-ai-developers|Oracle -- Agent Memory Course with DeepLearning.AI 2026]]))
+Memory patterns define how agents store and retrieve information across and within interactions.
 ==== Short-Term Memory (Context Window) ====
-The most basic memory: the LLM's context window holds recent conversation turns, system instructions, and immediate task context. Volatile and limited by model token constraints, managed via truncation, summarization, or sliding window strategies.
+Short-term memory is the agent's immediate conversational context — the current prompt and recent exchanges that fit within the LLM's context window. It is inherently limited by the model's maximum token count and is lost when the conversation ends. Effective short-term memory management involves summarizing older context, prioritizing recent and relevant information, and structuring prompts to maximize the utility of available tokens. See [[short_term_memory|Short-Term Memory]].
-  * **When to use:** Always present; the foundation all other memory types build upon.
-  * **Frameworks:** Native in all LLMs; LangChain ConversationBufferMemory, LlamaIndex chat history.
 ==== Long-Term Memory (Vector Store) ====
-Persistent storage of facts, preferences, and knowledge across sessions using vector embeddings for semantic retrieval. The agent can store and recall information that persists indefinitely, enabling learning over time.
+Long-term memory persists information across conversations using external storage, typically a vector database. The agent embeds information into vector representations and retrieves relevant memories via semantic similarity search. This enables agents to accumulate knowledge over time, remember user preferences, and build on past interactions. Use long-term memory for personalization, knowledge accumulation, and cross-session continuity. See [[long_term_memory|Long-Term Memory]].
-  * **When to use:** Personalization, knowledge accumulation, cross-session continuity, RAG-enhanced agents.
-  * **Frameworks:** Mem0, Zep, Letta (MemGPT), Cognee, Qdrant, Chroma, Pinecone, pgvector. ((source [[https://machinelearningmastery.com/the-6-best-ai-agent-memory-frameworks-you-should-try-in-2026/|Machine Learning Mastery -- 6 Best Agent Memory Frameworks 2026]]))
 ==== Episodic Memory ====
-Records specific past events, interactions, and outcomes as discrete episodes with temporal metadata. Unlike semantic long-term memory which stores facts, episodic memory preserves the **narrative context** -- what happened, when, and what the outcome was. ((source [[https://medium.com/@richardhightower/agent-memory-the-key-to-salient-episodic-memory-for-ai-agents-70b0f8e296db|Hightower 2026 -- Salient Episodic Memory for AI Agents]]))
+Episodic memory stores records of specific past experiences — complete interactions, task attempts, successes, and failures — rather than just extracted facts. The agent can recall how it handled similar situations in the past, what worked, and what did not. This supports learning from experience and avoids repeating mistakes. Use episodic memory when the agent performs recurring tasks and should improve over time. See [[episodic_memory|Episodic Memory]].
-  * **When to use:** Agents that need to learn from past successes/failures, avoid repeating mistakes, or reference prior interactions.
-  * **Frameworks:** Mem0, Zep/Graphiti (temporal knowledge graphs), LangGraph checkpointing.
 ==== Working Memory / Scratchpad ====
-A temporary workspace for intermediate reasoning artifacts -- draft plans, partial results, hypotheses being tested -- separate from the main conversation context. Analogous to human working memory for active problem-solving. ((source [[https://www.marktechpost.com/2025/11/10/comparing-memory-systems-for-llm-agents-vector-graph-and-event-logs/|MarkTechPost -- Comparing Memory Systems for LLM Agents]]))
+Working memory provides the agent with an explicit scratchpad for storing intermediate results, partial computations, and temporary state during complex reasoning. Unlike short-term memory, the scratchpad is structured and the agent can read, write, and organize it deliberately. Use working memory for multi-step computations, complex data transformations, or any task where intermediate state must be tracked explicitly. See [[working_memory|Working Memory]].
-  * **When to use:** Complex multi-step reasoning, plan-and-execute workflows, any task requiring intermediate state management.
-  * **Frameworks:** LangGraph state management, Redis-backed scratchpads, MemGPT/Letta virtual context.
-----
 ===== Communication Patterns =====
-Communication patterns govern **how agents interact with humans and each other** during task execution.
+Communication patterns define how information flows between agents and humans in an agentic system.
-==== Human-in-the-Loop (HITL) ====
-The agent handles routine operations autonomously but escalates edge cases, high-stakes decisions, or low-confidence outputs to a human for review and approval. The level of autonomy can be tuned from full human oversight to sparse supervision.
+==== Human-in-the-Loop ====
-  * **When to use:** Production systems in regulated industries, high-consequence decisions, building user trust during early deployment.
+Human-in-the-loop communication establishes structured interaction points where the agent requests human input, confirmation, or correction. This can range from simple approval gates to rich collaborative workflows where the human and agent iterate together. Effective HITL design minimizes human cognitive load while maximizing oversight of critical decisions. See [[human_in_the_loop|Human-in-the-Loop]].
-  * **Frameworks:** LangGraph interrupt/approval nodes, CrewAI human input tool, Anthropic tool use with confirmation.
 ==== Agent-to-Agent Messaging ====
-Agents communicate through structured message protocols, enabling interoperability across different frameworks and organizations. Standardized protocols are replacing ad-hoc integration.
+Agent-to-agent messaging enables direct communication between agents through structured message protocols. Messages can carry task assignments, results, queries, or coordination signals. Well-designed messaging protocols include clear message schemas, routing rules, and error handling. Use structured messaging when agents need to coordinate closely or share complex information. See [[agent_messaging|Agent-to-Agent Messaging]].
-  * **When to use:** Cross-framework agent collaboration, enterprise systems with agents from different vendors.
-  * **Frameworks:** Google A2A (Agent-to-Agent protocol), Anthropic MCP (Model Context Protocol), FIPA-ACL.
 ==== Shared Blackboard ====
-All agents read from and write to a shared knowledge store (the "blackboard"). Agents independently contribute partial solutions, and any agent can build upon another's contributions. Decoupled communication -- agents don't need to know about each other.
+The shared blackboard pattern provides a common knowledge store that all agents can read from and write to. Agents post partial results, observations, and hypotheses to the blackboard, and other agents react to relevant updates. This decouples agents from each other — they interact through the shared state rather than direct messages. Use the blackboard pattern for collaborative problem-solving where multiple agents contribute partial solutions. See [[blackboard_pattern|Blackboard Pattern]].
-  * **When to use:** Problems requiring incremental, collaborative knowledge building where agents have diverse specializations.
-  * **Frameworks:** Custom implementations with shared vector stores, Redis pub/sub, LangGraph shared state.
 ==== Event-Driven ====
-Agents subscribe to event streams and react to relevant events asynchronously. Enables loose coupling -- agents can be added or removed without modifying the overall system.
+Event-driven communication uses an event bus or message queue to decouple agent interactions. Agents publish events when they complete actions or detect relevant conditions, and other agents subscribe to events they care about. This pattern enables loose coupling, scalability, and asynchronous processing. Use event-driven architectures for systems with many agents that need to react to changing conditions. See [[event_driven_agents|Event-Driven Agents]].
-  * **When to use:** Real-time monitoring, workflow automation triggered by external events, scalable microservice-style agent systems.
-  * **Frameworks:** Apache Kafka + agent consumers, AWS EventBridge, custom event buses with LangGraph.
-----
 ===== Reliability Patterns =====
-Reliability patterns ensure **agents behave predictably and recover gracefully** from the inevitable failures in production systems. ((source [[https://appstekcorp.com/blog/design-patterns-for-agentic-ai-and-multi-agent-systems/|AppStek -- Design Patterns for Agentic AI]]))
+Reliability patterns ensure agent systems behave predictably and recover gracefully from failures.
 ==== Retry with Backoff ====
-When a tool call or LLM request fails, the agent retries with exponentially increasing delays. Handles transient network errors, rate limits, and temporary service outages without manual intervention.
+When an LLM call, tool invocation, or API request fails, the agent retries with exponentially increasing delays between attempts. This handles transient failures — rate limits, network blips, temporary service outages — without overwhelming the failing service. Use retry with backoff as a baseline reliability pattern for all external calls. See [[retry_patterns|Retry Patterns]].
-  * **When to use:** Any agent making external API calls or LLM requests in production.
-  * **Frameworks:** LangGraph stateful retries, tenacity (Python), built-in to most LLM SDK clients.
 ==== Fallback Chains ====
-Defines a cascade of backup strategies when the primary approach fails: try a simpler prompt, fall back to a different model, degrade gracefully to a cached response, or escalate to human handoff.
+Fallback chains define a prioritized list of alternative strategies to try when the primary approach fails. For example, if GPT-4 is unavailable, fall back to Claude; if the primary API fails, try a cached result. Each level in the chain may trade off quality for availability. Use fallback chains for production systems where uptime is critical. See [[fallback_chains|Fallback Chains]].
-  * **When to use:** High-availability systems where agent failure must not block the user.
-  * **Frameworks:** LangChain fallback chains, LangGraph conditional routing on failure, LiteLLM model fallbacks.
 ==== Circuit Breaker ====
-After repeated failures of a particular tool or service, the agent stops calling it for a cooldown period rather than continuing to fail. Prevents cascade failures and wasted resources. Borrowed directly from microservice architecture. ((source [[https://arxiv.org/html/2602.10479|Alenezi 2026 -- Evolution of Agentic AI Software Architecture]]))
+The circuit breaker pattern monitors failure rates for external services and temporarily stops calling a service that is consistently failing. After a cooldown period, the circuit breaker allows a test request through to see if the service has recovered. This prevents cascading failures and wasted resources on calls that are unlikely to succeed. Use circuit breakers when your agent depends on external services with variable reliability. See [[circuit_breaker|Circuit Breaker]].
-  * **When to use:** Agents depending on unreliable external services; production systems with strict latency budgets.
+==== Guardrails / Validation ====
-  * **Frameworks:** Custom middleware; emerging support in LangGraph and agent gateway layers.
-==== Guardrails / Output Validation ====
+Guardrails enforce constraints on agent inputs and outputs through validation layers. Input guardrails filter or reject harmful, off-topic, or malformed requests. Output guardrails validate that agent responses meet format requirements, factual accuracy checks, safety criteria, or business rules. Use guardrails in any production deployment to prevent harmful or incorrect agent behavior. See [[guardrails|Guardrails & Validation]].
-Structured checks on agent inputs and outputs: schema validation, toxicity filtering, factual consistency checks, PII detection, and policy compliance. Acts as a safety net between the agent's reasoning and its final output.
-  * **When to use:** Every production agent system. Non-negotiable for customer-facing applications.
-  * **Frameworks:** Guardrails AI, NeMo Guardrails (NVIDIA), LlamaGuard, Anthropic constitutional AI, custom validators.
 ==== Dual LLM (Planner + Executor) ====
-Two models serve different roles: a stronger/larger model handles planning and critical decisions, while a smaller/faster/cheaper model handles routine execution. Optimizes the cost-quality tradeoff across the agent workflow.
+The dual LLM pattern separates planning from execution using two different models or prompts. A capable, expensive model handles high-level planning and decision-making, while a faster, cheaper model handles routine execution steps. This balances quality and cost — critical thinking gets the best model, while mechanical tasks use an efficient one. Use the dual LLM pattern when cost optimization is important but complex reasoning quality must be preserved. See [[dual_llm_pattern|Dual LLM Pattern]].
-  * **When to use:** Cost-sensitive production systems, high-throughput pipelines where not every step requires frontier-model reasoning.
-  * **Frameworks:** LangGraph (heterogeneous model assignment), LiteLLM routing, custom orchestration.
-----
 ===== Efficiency Patterns =====
-Efficiency patterns optimize **cost, latency, and resource usage** -- critical concerns as agents scale from prototypes to production.
+Efficiency patterns reduce the cost, latency, and resource consumption of agent systems.
 ==== Caching (Semantic + Exact) ====
-**Exact caching** stores responses keyed by identical inputs. **Semantic caching** uses embedding similarity to reuse responses for semantically equivalent (but differently worded) queries. Both reduce redundant LLM calls, cutting cost and latency.
+Caching stores the results of previous LLM calls or tool invocations for reuse. Exact caching returns stored results when the input matches precisely. Semantic caching uses embedding similarity to return cached results for inputs that are semantically equivalent but not identical. Caching can dramatically reduce costs and latency for repetitive queries. Use caching when the agent handles recurring or similar requests. See [[agent_caching|Agent Caching]].
-  * **When to use:** High-traffic agents with repetitive queries; cost optimization for expensive models.
-  * **Frameworks:** Redis (semantic search), GPTCache, Mem0 relevance-scored caching, LangChain cache backends.
 ==== Speculative Execution ====
-The agent generates multiple candidate responses or tool call paths in parallel, verifying and selecting the fastest valid result. Trades compute for latency -- particularly effective when validation is cheap relative to generation.
+Speculative execution runs multiple possible next steps in parallel before knowing which one will actually be needed. When the decision point is reached, the pre-computed result for the chosen path is available immediately. This trades compute cost for reduced latency. Use speculative execution when latency is critical and the set of possible next steps is small and predictable. See [[speculative_execution|Speculative Execution]].
-  * **When to use:** Latency-critical applications; tasks where multiple valid approaches exist and the fastest matters.
-  * **Frameworks:** LangGraph parallel branches, custom implementations, speculative decoding in LLM inference engines (vLLM).
 ==== Budget-Aware Reasoning ====
-The agent monitors its token usage, API costs, or wall-clock time and adjusts its reasoning depth accordingly. May use cheaper models for simple sub-tasks, skip optional refinement steps, or invoke early stopping when confident.
+Budget-aware reasoning constrains the agent's resource consumption — limiting the number of LLM calls, total tokens, tool invocations, or wall-clock time. The agent must reason about how to allocate its budget across subtasks and may choose simpler strategies when resources are scarce. Use budget-aware reasoning in production systems where costs must be controlled or where response time SLAs exist. See [[budget_aware_reasoning|Budget-Aware Reasoning]].
-  * **When to use:** Production systems with cost budgets, metered API access, or strict latency SLAs.
-  * **Frameworks:** LangGraph step budgets, Mem0 relevance scoring and forgetting, custom token tracking middleware.
 ==== Parallel Tool Calling ====
-The agent invokes multiple independent tools simultaneously rather than sequentially. Requires the orchestration layer to detect independence between tool calls and dispatch them concurrently.
+Parallel tool calling executes multiple independent tool invocations simultaneously rather than sequentially. When the agent identifies that several pieces of information are needed and the requests are independent, it dispatches all calls at once and processes results as they arrive. This can reduce latency by a factor proportional to the number of parallel calls. Use parallel tool calling whenever the agent needs multiple independent pieces of external data. See [[parallel_tool_calling|Parallel Tool Calling]].
-  * **When to use:** Any agent workflow with independent tool calls (e.g., searching multiple sources, fetching data from multiple APIs).
-  * **Frameworks:** OpenAI parallel function calling, Anthropic parallel tool use, LangGraph fan-out, AutoGen concurrent execution.
-----
-===== Pattern Selection Guide =====
-^ Concern ^ Start With ^ Scale To ^
-| Single-agent reasoning | CoT / ReAct | ToT / Reflexion / Self-Consistency |
-| Multi-step tasks | Planning + Tool Use | Map-Reduce / Hierarchical Delegation |
-| Multi-agent coordination | Supervisor | Hierarchical / Swarm (depending on control needs) |
-| Memory | Short-term + RAG | Episodic + Working Memory + Knowledge Graphs |
-| Reliability | Guardrails + Retry | Circuit Breaker + Fallback Chains + Dual LLM |
-| Efficiency | Caching | Budget-Aware + Parallel Tool Calling + Speculative |
-----
 ===== See Also =====
-  * [[agent_loop|Agent Loop]]
   * [[react_framework|ReAct Framework]]
+  * [[multi_agent_systems|Multi-Agent Systems]]
   * [[planning|Planning]]
-  * [[multi_agent_systems|Multi-Agent Systems]]
+  * [[tool_use|Tool Use]]
-  * [[modular_architectures|Modular Architectures]]
+  * [[human_in_the_loop|Human-in-the-Loop]]
+  * [[chain_of_thought|Chain of Thought]]
+  * [[critic_self_correction|Critic & Self-Correction]]
+  * [[guardrails|Guardrails & Validation]]
+  * [[agentic_ai|Agentic AI Overview]]
 ===== References =====

AI Agent Knowledge Base

User Tools

Site Tools

Differences

Page Tools