====== Single vs Multi-Agent Architectures ======

Choosing between single-agent and multi-agent architectures is a critical design decision that impacts cost, latency, reliability, and maintainability. Understanding single agent architecture and how single agent systems compare to multi-agent approaches is essential for building effective AI workflows. This guide provides a research-backed framework based on published benchmarks and real-world deployments.

===== Overview =====

  * **Single Agent** — One LLM-powered agent handles the entire workflow with access to all tools and context. Simple, fast, predictable.
  * **Orchestrator + Specialists(([[https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ai-agents/single-agent-multiple-agents|Microsoft Azure - Single Agent vs Multiple Agents]]))** — A central coordinator delegates subtasks to domain-specific agents. Modular, scalable, but adds coordination overhead.
  * **Peer-to-Peer** — Autonomous agents collaborate via message passing or shared memory. Maximum parallelism, hardest to debug.

===== Decision Tree =====

<mermaid>
graph TD
    A[Start: Define Your Task] --> B{How many distinct\ndomains or skills?}
    B -->|1-2| C{Workflow\npredictable?}
    B -->|3+| D{Tasks\nindependent?}
    C -->|Yes| E[Single Agent]
    C -->|No| F{Context window\nsufficient?}
    F -->|Yes| E
    F -->|No| G[Orchestrator + Specialists(([[https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ai-agents/single-agent-multiple-agents|Microsoft Azure - Single Agent vs Multiple Agents]]))]
    D -->|Yes| H{Need real-time\nparallelism?}
    D -->|No| G
    H -->|Yes| I[Peer-to-Peer]
    H -->|No| G
    E --> J{Scaling beyond\n10K ops per day?}
    J -->|Yes| K[Consider Multi-Agent]
    J -->|No| L[Stay Single Agent]
    style E fill:#4CAF50,color:#fff
    style G fill:#FF9800,color:#fff
    style I fill:#9C27B0,color:#fff
    style L fill:#4CAF50,color:#fff
    style K fill:#FF9800,color:#fff
</mermaid>

===== Architecture Comparison =====

^ Factor ^ Single Agent ^ Orchestrator + Specialists(([[https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ai-agents/single-agent-multiple-agents|Microsoft Azure - Single Agent vs Multiple Agents]])) ^ Peer-to-Peer ^
| **Complexity** | Low | Medium | High |
| **Latency** | Lowest (1 LLM call) | Medium (2-5 LLM calls) | Variable (parallel) |
| **Token Cost** | 1x baseline | 3-5x baseline | 10-15x baseline |
| **Debugging** | Simple, unified logs | Moderate, trace per agent | Hard, distributed tracing |
| **Failure Mode** | Single point of failure | Isolated failures, graceful degradation | Partial ops continue |
| **Scalability** | Limited by context window | Good with agent specialization | Best for high parallelism |
| **Accuracy (SWE-bench)** | ~65% | ~72% | Similar to orchestrator |
| **Best For** | Sequential, well-defined tasks | Multi-domain workflows | High-throughput parallel tasks |

//Sources: SWE-bench Verified 2025, Redis engineering blog, Microsoft Azure architecture guidance//

===== Complexity Thresholds =====

Use these rules of thumb based on published research:

^ Indicator ^ Single Agent ^ Multi-Agent ^
| Tool/function count | 1-5 | 6+ |
| Distinct knowledge domains | 1-2 | 3+ |
| Daily operations | Less than 10,000 | Over 10,000 |
| Context required | Fits one window | Exceeds context limits |
| Task independence | Sequential | Parallelizable |
| Error tolerance | Can retry whole workflow | Needs isolated recovery |

===== Pattern Details =====

=== Single Agent ===

One agent, one context window, all tools available. Start here.

<code python>
# Single agent with tool access
tools = [search_tool, calculator_tool, database_tool]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a research assistant with access to search, calculation, and database tools."},
        {"role": "user", "content": user_query}
    ],
    tools=tools
)
# Agent decides which tools to call in sequence
</code>

**Strengths**: Low latency, simple debugging, minimal token overhead, easy to deploy.

**Weaknesses**: Context window limits, single point of failure, struggles with 6+ tools.

=== Orchestrator + Specialists(([[https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ai-agents/single-agent-multiple-agents|Microsoft Azure - Single Agent vs Multiple Agents]])) ===

A coordinator routes tasks to domain experts. Most common multi-agent pattern.

<code python>
# Orchestrator pattern
class Orchestrator:
    def __init__(self):
        self.agents = {
            "researcher": ResearchAgent(model="gpt-4o"),
            "coder": CodingAgent(model="claude-sonnet"),
            "reviewer": ReviewAgent(model="gpt-4o"),
        }

    def process(self, task):
        plan = self.plan(task)  # Decompose into subtasks
        results = {}
        for step in plan:
            agent = self.agents[step.agent_type]
            results[step.id] = agent.execute(
                step.instruction,
                context=self.gather_context(step, results)
            )
        return self.synthesize(results)
</code>

**Strengths**: Modular, each agent optimized for its domain, isolated failures, scalable.

**Weaknesses**: Orchestrator becomes bottleneck, 3-5x token cost, coordination latency.

=== Peer-to-Peer ===

Agents communicate directly without central control. Best for embarrassingly parallel tasks.

<code python>
# Peer-to-peer with shared memory
import asyncio

class PeerAgent:
    def __init__(self, role, shared_memory):
        self.role = role
        self.memory = shared_memory  # Shared state store

    async def work(self, task):
        result = await self.llm_call(task)
        await self.memory.publish(self.role, result)
        # React to other agents' outputs
        async for update in self.memory.subscribe():
            if self.should_respond(update):
                await self.respond(update)

# Launch agents in parallel
agents = [PeerAgent("analyst", mem), PeerAgent("writer", mem), PeerAgent("critic", mem)]
await asyncio.gather(*[a.work(task) for a in agents])
</code>

**Strengths**: Maximum parallelism, no central bottleneck, up to 64% throughput improvement.

**Weaknesses**: 10-15x token cost, chaotic coordination, extremely hard to debug.

===== Real-World Frameworks =====

^ Framework ^ Pattern ^ Best For ^ Key Feature ^
| **LangGraph** | Orchestrator | Stateful multi-step workflows | Graph-based state machines |
| **CrewAI** | Orchestrator + Specialists(([[https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ai-agents/single-agent-multiple-agents|Microsoft Azure - Single Agent vs Multiple Agents]])) | Role-based team workflows | Agent roles and delegation |
| **AutoGen** | Peer-to-Peer | Research and collaborative tasks | Dynamic agent conversations |
| **OpenAI Swarm** | Orchestrator | Lightweight agent handoffs | Minimal coordination overhead |
| **Claude Tools** | Single Agent | Tool-heavy sequential tasks | Native tool use, large context |

===== Failure Modes and Mitigation =====

=== Single Agent Failures ===
  * **Context overflow** — Mitigate with summarization or RAG
  * **Tool selection errors** — Mitigate with better tool descriptions
  * **Total crash** — Mitigate with retry logic and checkpointing

=== Multi-Agent Failures ===
  * **Coordination deadlock** — Mitigate with timeouts and fallback agents
  * **Message storms** — Mitigate with rate limiting and turn-taking protocols
  * **Inconsistent state** — Mitigate with shared memory and consensus mechanisms
  * **Cascading failures** — Mitigate with circuit breakers per agent

===== Performance Benchmarks =====

From published 2025-2026 research:

  * Multi-agent coding systems score **72.2% on SWE-bench Verified** vs ~65% for single agents using the same base model((([[https://arxiv.org/html/2509.10769v1|AgentArch Benchmark - Evaluating Agent Architectures.]]))),((([[https://redis.io/blog/single-agent-vs-multi-agent-systems/|Redis - Single Agent vs Multi-Agent Systems.]])))
  * Multi-agent systems show **23% higher accuracy** on complex reasoning tasks(([[https://vibecoding.app/blog/multi-agent-vs-single-agent-coding|VibeCoding - Multi-Agent vs Single-Agent Coding]]))
  * Multi-agent throughput improvement of **up to 64%** for parallelizable workloads
  * Multi-agent token cost is **10-15x higher** for complex requests
  * Single-agent latency is typically **2-5x lower** than orchestrated systems

===== Key Takeaways =====

  - **Default to single agent**. It covers 80% of use cases with lower cost and complexity.
  - **Add agents for specialization**, not just because you can. Each agent should have a clear, distinct role.
  - **Orchestrator + Specialists(([[https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ai-agents/single-agent-multiple-agents|Microsoft Azure - Single Agent vs Multiple Agents]]))** is the most practical multi-agent pattern for production.
  - **Peer-to-peer** is rarely needed outside high-throughput parallel processing.
  - **Measure the tradeoff**: multi-agent gains 7-23% accuracy but costs 3-15x more tokens.

===== See Also =====

  * [[when_to_use_rag_vs_fine_tuning|When to Use RAG vs Fine-Tuning]] — Choosing knowledge strategies
  * [[how_to_structure_system_prompts|How to Structure System Prompts]] — Prompt design for agents
  * [[how_to_choose_chunk_size|How to Choose Chunk Size]] — RAG optimization

===== References =====