Table of Contents

Mixture of Agents

Mixture of Agents (MoA) is a methodology for leveraging the collective strengths of multiple large language models through a layered architecture where LLMs iteratively refine each other's outputs. Proposed by Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, and James Zou, MoA achieves state-of-the-art performance on major benchmarks using only open-source LLMs, surpassing GPT-4 Omni. The paper was accepted as a Spotlight paper at ICLR 2025.

Overview

The core insight behind MoA is that LLMs tend to generate better responses when provided with outputs from other models as reference, even if those reference outputs are of lower quality individually. MoA exploits this “collaborativeness” property by organizing models into layers where each layer's agents refine the outputs of the previous layer.

This approach demonstrates that orchestrating multiple smaller, open-source models can exceed the performance of the largest proprietary models, offering a practical path to high-quality LLM outputs without dependence on any single provider.

Architecture

MoA organizes LLMs into l layers, each containing n agents:

The process repeats for multiple cycles (typically 2-3 layers), with each iteration yielding more robust outputs by addressing individual model weaknesses through diverse perspectives.

Proposers and Aggregators

MoA distinguishes two functional roles:

Adding multiple aggregator layers iteratively improves quality. On MATH reasoning tasks, Layer 1 accuracy of 0.428 (Qwen1.5-72B-Chat) improves to 0.552 by Layer 3.

Benchmark Results

MoA achieves leading results across major LLM evaluation benchmarks:

Benchmark MoA Score GPT-4 Omni Margin
AlpacaEval 2.0 65.1% 57.5% +7.6%
MT-Bench 9.25 avg 9.19 +0.06
Arena-Hard SOTA
FLASK Outperforms Correctness, factuality

Notably, the MoA configuration achieving 65.1% on AlpacaEval 2.0 uses only open-source LLMs, demonstrating that collective inference with smaller models can surpass the best proprietary models.

Code Example

A simplified MoA implementation with proposers and an aggregator:

from openai import OpenAI
 
PROPOSER_MODELS = ["mistral-7b", "llama-3-70b", "qwen-72b"]
AGGREGATOR_MODEL = "qwen-72b"
 
def moa_inference(prompt: str, num_layers: int = 3) -> str:
    client = OpenAI(base_url="https://api.together.xyz/v1")
 
    # Layer 1: Independent proposals
    proposals = []
    for model in PROPOSER_MODELS:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}]
        )
        proposals.append(response.choices[0].message.content)
 
    # Layers 2+: Iterative refinement
    for layer in range(1, num_layers):
        context = "\n\n".join(
            f"[Model {i+1}]: {p}" for i, p in enumerate(proposals)
        )
        refined = []
        for model in PROPOSER_MODELS:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": f"Previous responses:\n{context}"},
                    {"role": "user", "content": f"Synthesize and improve: {prompt}"}
                ]
            )
            refined.append(response.choices[0].message.content)
        proposals = refined
 
    # Final aggregation
    final_context = "\n\n".join(
        f"[Response {i+1}]: {p}" for i, p in enumerate(proposals)
    )
    final = client.chat.completions.create(
        model=AGGREGATOR_MODEL,
        messages=[
            {"role": "system", "content": f"Synthesize the best answer:\n{final_context}"},
            {"role": "user", "content": prompt}
        ]
    )
    return final.choices[0].message.content

References

See Also