Mixture of Agents

Mixture of Agents (MoA) is a methodology for leveraging the collective strengths of multiple large language models through a layered architecture where LLMs iteratively refine each other's outputs. Proposed by Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, and James Zou, MoA achieves state-of-the-art performance on major benchmarks using only open-source LLMs, surpassing GPT-4 Omni.¹⁾ The paper was accepted as a Spotlight paper at ICLR 2025.²⁾

Overview

The core insight behind MoA is that LLMs tend to generate better responses when provided with outputs from other models as reference, even if those reference outputs are of lower quality individually. MoA exploits this “collaborativeness” property by organizing models into layers where each layer's agents refine the outputs of the previous layer.

This approach demonstrates that orchestrating multiple smaller, open-source models can exceed the performance of the largest proprietary models, offering a practical path to high-quality LLM outputs without dependence on any single provider.

Architecture

MoA organizes LLMs into l layers, each containing n agents:

Layer 1: Agents A(1,1) through A(1,n) generate independent initial responses to the input prompt
Layer i > 1: Each agent receives all outputs from the previous layer as auxiliary context, producing a refined response
Final Layer: An aggregator synthesizes the last layer's outputs into the final response

The process repeats for multiple cycles (typically 2-3 layers), with each iteration yielding more robust outputs by addressing individual model weaknesses through diverse perspectives.

Proposers and Aggregators

MoA distinguishes two functional roles:

Proposers — Models that generate diverse candidate responses in early layers. Selected for high output diversity and heterogeneity to maximize the information available to later layers
Aggregators — Models in later layers that synthesize, merge, and refine prior outputs. Even when proposer outputs score poorly individually, aggregators can extract and combine their strengths effectively

Adding multiple aggregator layers iteratively improves quality. On MATH reasoning tasks, Layer 1 accuracy of 0.428 (Qwen1.5-72B-Chat) improves to 0.552 by Layer 3.³⁾

Benchmark Results

MoA achieves leading results across major LLM evaluation benchmarks:⁴⁾

Benchmark	MoA Score	GPT-4 Omni	Margin
AlpacaEval 2.0	65.1%	57.5%	+7.6%
MT-Bench	9.25 avg	9.19	+0.06
Arena-Hard	SOTA	—	—
FLASK	Outperforms	—	Correctness, factuality

Notably, the MoA configuration achieving 65.1% on AlpacaEval 2.0 uses only open-source LLMs, demonstrating that collective inference with smaller models can surpass the best proprietary models.

Code Example

A simplified MoA implementation with proposers and an aggregator:

from [[openai|openai]] import [[openai|OpenAI]]
 
PROPOSER_MODELS = ["mistral-7b", "llama-3-70b", "qwen-72b"]
AGGREGATOR_MODEL = "qwen-72b"
 
def moa_inference(prompt: str, num_layers: int = 3) -> str:
    client = [[openai|OpenAI]](base_url="https://api.together.xyz/v1")
 
    # Layer 1: Independent proposals
    proposals = []
    for model in PROPOSER_MODELS:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}]
        )
        proposals.append(response.choices[0].message.content)
 
    # Layers 2+: Iterative refinement
    for layer in range(1, num_layers):
        context = "\n\n".join(
            f"[Model {i+1}]: {p}" for i, p in enumerate(proposals)
        )
        refined = []
        for model in PROPOSER_MODELS:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": f"Previous responses:\n{context}"},
                    {"role": "user", "content": f"Synthesize and improve: {prompt}"}
                ]
            )
            refined.append(response.choices[0].message.content)
        proposals = refined
 
    # Final aggregation
    final_context = "\n\n".join(
        f"[Response {i+1}]: {p}" for i, p in enumerate(proposals)
    )
    final = client.chat.completions.create(
        model=AGGREGATOR_MODEL,
        messages=[
            {"role": "system", "content": f"Synthesize the best answer:\n{final_context}"},
            {"role": "user", "content": prompt}
        ]
    )
    return final.choices[0].message.content