====== Multi-Agent Architecture (Planner-Generator-Evaluator) ====== The **Planner-Generator-Evaluator (PGE) architecture** represents a structured approach to decomposing complex AI agent tasks by separating planning, execution, and evaluation into distinct specialized components. This architecture addresses fundamental limitations in single-agent systems where a unified model attempts to simultaneously reason about tasks, implement solutions, and assess quality—a combination that often leads to biased self-evaluation and suboptimal outcomes (([[https://alphasignalai.substack.com/p/a-closer-look-at-harness-engineering|AlphaSignal - Multi-Agent Architecture: Planner-Generator-Evaluator (2026]])). ===== Conceptual Framework and GAN Inspiration ===== The PGE architecture draws inspiration from **Generative Adversarial Networks (GANs)**, which achieve improved output quality through structural opposition between generator and discriminator networks. Similarly, the PGE framework decouples three critical functions that would otherwise conflict within a single agent (([[https://alphasignalai.substack.com/p/a-closer-look-at-harness-engineering|AlphaSignal - Multi-Agent Architecture: Planner-Generator-Evaluator (2026]])). The core insight is that **honest self-evaluation is fundamentally difficult for single agents**. A model that must both generate outputs and evaluate them faces inherent conflicts of interest—the same model that produces a solution has motivation to judge that solution favorably. By introducing independent components, the architecture creates structural incentives for honest assessment. This separation mirrors how human organizations typically isolate quality assurance from engineering teams to prevent bias. ===== Component Architecture and Responsibilities ===== **The Planner component** functions as the strategic brain of the system. Its primary responsibility involves converting high-level user prompts and requirements into detailed specifications that guide subsequent execution. Rather than immediately attempting implementation, the Planner engages in abstract reasoning about problem structure, requirements decomposition, and solution strategy. This component produces structured specifications that define what should be built, without concerning itself with how to build it (([[https://alphasignalai.substack.com/p/a-closer-look-at-harness-engineering|AlphaSignal - Multi-Agent Architecture: Planner-Generator-Evaluator (2026]])). **The Generator component** serves as the execution engine, implementing features incrementally based on Planner specifications. This agent focuses exclusively on translating abstract specifications into concrete implementations, following the detailed guidance provided by the Planner. The Generator's responsibility is pragmatic and focused—turning well-defined specifications into working code or features, without simultaneously evaluating whether those implementations meet broader quality standards (([[https://alphasignalai.substack.com/p/a-closer-look-at-harness-engineering|AlphaSignal - Multi-Agent Architecture: Planner-Generator-Evaluator (2026]])). **The Evaluator component** provides independent quality assessment through behavioral testing. Rather than relying on abstract quality metrics or the Generator's self-assessment, the Evaluator employs //actual interaction testing// using tools like **Playwright**—a browser automation framework that enables the agent to test outputs as a real user would. This approach moves evaluation from hypothetical assessment to empirical verification, identifying failures that would occur in actual usage (([[https://alphasignalai.substack.com/p/a-closer-look-at-harness-engineering|AlphaSignal - Multi-Agent Architecture: Planner-Generator-Evaluator (2026]])). ===== Technical Implementation and Testing Methodology ===== The Evaluator's use of **Playwright** represents a significant departure from purely model-based evaluation. Playwright enables headless browser automation that simulates real user interactions with generated web interfaces, applications, or services. The Evaluator can navigate generated interfaces, interact with components, verify responses, and identify failures that would manifest in actual usage but might be missed by the Generator's self-assessment. This testing approach provides several advantages over model-based evaluation: * **Behavioral verification**: Tests actual functionality rather than theoretical correctness * **Edge case detection**: Identifies failures that emerge only in realistic interaction scenarios * **Regression prevention**: Ensures iterative improvements do not break existing functionality * **Performance metrics**: Measures real-world latency, responsiveness, and reliability The separation enables feedback loops where the Evaluator identifies specific failures, which can be communicated back to the Planner and Generator for iterative improvement. The Evaluator does not merely approve or reject outputs—it provides actionable feedback about what failed and why. ===== Trade-offs: Latency and Cost ===== The architectural benefits of PGE separation come at measurable cost. Routing tasks through three distinct specialized agents introduces **increased latency** compared to a unified agent approach. Each component performs serial computation, and integration between components adds coordination overhead. A single-agent system might generate and evaluate output in fewer sequential steps, while PGE requires planning, execution, and independent evaluation as distinct phases (([[https://alphasignalai.substack.com/p/a-closer-look-at-harness-engineering|AlphaSignal - Multi-Agent Architecture: Planner-Generator-Evaluator (2026]])). **Computational expense** similarly increases through the requirement to run three separate agents and multiple evaluation iterations. The Generator may need to revise implementations based on Evaluator feedback, triggering additional planning and generation cycles. For high-volume applications with strict latency requirements, these costs may be prohibitive. These trade-offs suggest the PGE architecture is most appropriate for applications where **quality and reliability outweigh latency constraints**—scenarios where thorough evaluation and iterative refinement justify the additional computational cost. ===== Related Approaches and Positioning ===== The PGE architecture connects to broader research on **multi-agent systems** and **collaborative reasoning in language models**. Related approaches include chain-of-thought prompting, which separates reasoning from conclusion, and retrieval-augmented generation (RAG), which decouples information retrieval from synthesis. The PGE framework extends this separation philosophy to the planning-execution-evaluation cycle, creating structural independence at a higher level of task decomposition. ===== See Also ===== * [[planning|Agent Planning: How AI Agents Plan and Reason]] * [[single_agent_architecture|Single Agent Architecture: Design Patterns for Solo AI Agents]] * [[ai_agents|AI Agents]] * [[modular_architectures|Modular Architectures]] * [[agent_as_a_judge|Agent-as-a-Judge]] ===== References =====