====== Mixture of Agents ======
**Mixture of Agents (MoA)** is a methodology for leveraging the collective strengths of multiple large language models through a layered architecture where LLMs iteratively refine each other's outputs. Proposed by Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, and James Zou, MoA achieves state-of-the-art performance on major benchmarks using only open-source LLMs, surpassing GPT-4 Omni.(([[https://arxiv.org/abs/2406.04692|arXiv:2406.04692 — Mixture-of-Agents Enhances Large Language Model Capabilities]])) The paper was accepted as a **Spotlight paper at ICLR 2025**.(([[https://proceedings.iclr.cc/paper_files/paper/2025/hash/5434be94e82c54327bb9dcaf7fca52b6-Abstract-Conference.html|ICLR 2025 — MoA (Spotlight Paper]]))

===== Overview =====
The core insight behind MoA is that LLMs tend to generate better responses when provided with outputs from other models as reference, even if those reference outputs are of lower quality individually. MoA exploits this "collaborativeness" property by organizing models into layers where each layer's agents refine the outputs of the previous layer.

This approach demonstrates that orchestrating multiple smaller, open-source models can exceed the performance of the largest proprietary models, offering a practical path to high-quality LLM outputs without dependence on any single provider.

===== Architecture =====
MoA organizes LLMs into **l layers**, each containing **n agents**:

  * **Layer 1**: Agents A(1,1) through A(1,n) generate independent initial responses to the input prompt
  * **Layer i > 1**: Each agent receives all outputs from the previous layer as auxiliary context, producing a refined response
  * **Final Layer**: An aggregator synthesizes the last layer's outputs into the final response

The process repeats for multiple cycles (typically 2-3 layers), with each iteration yielding more robust outputs by addressing individual model weaknesses through diverse perspectives.

=== Proposers and Aggregators ===
MoA distinguishes two functional roles:

  * **Proposers** — Models that generate diverse candidate responses in early layers. Selected for high output diversity and heterogeneity to maximize the information available to later layers
  * **Aggregators** — Models in later layers that synthesize, merge, and refine prior outputs. Even when proposer outputs score poorly individually, aggregators can extract and combine their strengths effectively

Adding multiple aggregator layers iteratively improves quality. On MATH reasoning tasks, Layer 1 accuracy of 0.428 (Qwen1.5-72B-Chat) improves to 0.552 by Layer 3.(([[https://github.com/togethercomputer/moa|Together AI — MoA GitHub Repository]]))

===== Benchmark Results =====
MoA achieves leading results across major LLM evaluation benchmarks:(([[https://openreview.net/forum?id=h0ZfDIrj7T|OpenReview — MoA ICLR 2025 Submission]]))

^ Benchmark ^ MoA Score ^ GPT-4 Omni ^ Margin ^
| AlpacaEval 2.0 | 65.1% | 57.5% | +7.6% |
| MT-Bench | 9.25 avg | 9.19 | +0.06 |
| Arena-Hard | SOTA | — | — |
| FLASK | Outperforms | — | Correctness, factuality |

Notably, the MoA configuration achieving 65.1% on AlpacaEval 2.0 uses **only open-source LLMs**, demonstrating that collective inference with smaller models can surpass the best proprietary models.

===== Code Example =====
A simplified MoA implementation with proposers and an aggregator:

<code python>
from [[openai|openai]] import [[openai|OpenAI]]

PROPOSER_MODELS = ["mistral-7b", "llama-3-70b", "qwen-72b"]
AGGREGATOR_MODEL = "qwen-72b"

def moa_inference(prompt: str, num_layers: int = 3) -> str:
    client = [[openai|OpenAI]](base_url="https://api.together.xyz/v1")

    # Layer 1: Independent proposals
    proposals = []
    for model in PROPOSER_MODELS:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}]
        )
        proposals.append(response.choices[0].message.content)

    # Layers 2+: Iterative refinement
    for layer in range(1, num_layers):
        context = "\n\n".join(
            f"[Model {i+1}]: {p}" for i, p in enumerate(proposals)
        )
        refined = []
        for model in PROPOSER_MODELS:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": f"Previous responses:\n{context}"},
                    {"role": "user", "content": f"Synthesize and improve: {prompt}"}
                ]
            )
            refined.append(response.choices[0].message.content)
        proposals = refined

    # Final aggregation
    final_context = "\n\n".join(
        f"[Response {i+1}]: {p}" for i, p in enumerate(proposals)
    )
    final = client.chat.completions.create(
        model=AGGREGATOR_MODEL,
        messages=[
            {"role": "system", "content": f"Synthesize the best answer:\n{final_context}"},
            {"role": "user", "content": prompt}
        ]
    )
    return final.choices[0].message.content
</code>

===== See Also =====
  * [[cognitive_architectures_language_agents|Cognitive Architectures for Language Agents (CoALA)]]
  * [[hybrid_mamba_attention_moe|Hybrid Mamba-Attention Mixture-of-Experts]]
  * [[multi_agent_systems|Multi-Agent Systems]]
  * [[pioneer_agent|Pioneer Agent]]
  * [[rise_potential_llm_agents_survey|The Rise and Potential of Large Language Model Based Agents: A Survey]]

===== References =====