====== Mixture of Agents ====== **Mixture of Agents (MoA)** is a methodology for leveraging the collective strengths of multiple large language models through a layered architecture where LLMs iteratively refine each other's outputs. Proposed by Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, and James Zou, MoA achieves state-of-the-art performance on major benchmarks using only open-source LLMs, surpassing GPT-4 Omni. The paper was accepted as a **Spotlight paper at ICLR 2025**. ===== Overview ===== The core insight behind MoA is that LLMs tend to generate better responses when provided with outputs from other models as reference, even if those reference outputs are of lower quality individually. MoA exploits this "collaborativeness" property by organizing models into layers where each layer's agents refine the outputs of the previous layer. This approach demonstrates that orchestrating multiple smaller, open-source models can exceed the performance of the largest proprietary models, offering a practical path to high-quality LLM outputs without dependence on any single provider. ===== Architecture ===== MoA organizes LLMs into **l layers**, each containing **n agents**: * **Layer 1**: Agents A(1,1) through A(1,n) generate independent initial responses to the input prompt * **Layer i > 1**: Each agent receives all outputs from the previous layer as auxiliary context, producing a refined response * **Final Layer**: An aggregator synthesizes the last layer's outputs into the final response The process repeats for multiple cycles (typically 2-3 layers), with each iteration yielding more robust outputs by addressing individual model weaknesses through diverse perspectives. === Proposers and Aggregators === MoA distinguishes two functional roles: * **Proposers** — Models that generate diverse candidate responses in early layers. Selected for high output diversity and heterogeneity to maximize the information available to later layers * **Aggregators** — Models in later layers that synthesize, merge, and refine prior outputs. Even when proposer outputs score poorly individually, aggregators can extract and combine their strengths effectively Adding multiple aggregator layers iteratively improves quality. On MATH reasoning tasks, Layer 1 accuracy of 0.428 (Qwen1.5-72B-Chat) improves to 0.552 by Layer 3. ===== Benchmark Results ===== MoA achieves leading results across major LLM evaluation benchmarks: ^ Benchmark ^ MoA Score ^ GPT-4 Omni ^ Margin ^ | AlpacaEval 2.0 | 65.1% | 57.5% | +7.6% | | MT-Bench | 9.25 avg | 9.19 | +0.06 | | Arena-Hard | SOTA | — | — | | FLASK | Outperforms | — | Correctness, factuality | Notably, the MoA configuration achieving 65.1% on AlpacaEval 2.0 uses **only open-source LLMs**, demonstrating that collective inference with smaller models can surpass the best proprietary models. ===== Code Example ===== A simplified MoA implementation with proposers and an aggregator: from openai import OpenAI PROPOSER_MODELS = ["mistral-7b", "llama-3-70b", "qwen-72b"] AGGREGATOR_MODEL = "qwen-72b" def moa_inference(prompt: str, num_layers: int = 3) -> str: client = OpenAI(base_url="https://api.together.xyz/v1") # Layer 1: Independent proposals proposals = [] for model in PROPOSER_MODELS: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}] ) proposals.append(response.choices[0].message.content) # Layers 2+: Iterative refinement for layer in range(1, num_layers): context = "\n\n".join( f"[Model {i+1}]: {p}" for i, p in enumerate(proposals) ) refined = [] for model in PROPOSER_MODELS: response = client.chat.completions.create( model=model, messages=[ {"role": "system", "content": f"Previous responses:\n{context}"}, {"role": "user", "content": f"Synthesize and improve: {prompt}"} ] ) refined.append(response.choices[0].message.content) proposals = refined # Final aggregation final_context = "\n\n".join( f"[Response {i+1}]: {p}" for i, p in enumerate(proposals) ) final = client.chat.completions.create( model=AGGREGATOR_MODEL, messages=[ {"role": "system", "content": f"Synthesize the best answer:\n{final_context}"}, {"role": "user", "content": prompt} ] ) return final.choices[0].message.content ===== References ===== * [[https://arxiv.org/abs/2406.04692|arXiv:2406.04692 — Mixture-of-Agents Enhances Large Language Model Capabilities]] * [[https://proceedings.iclr.cc/paper_files/paper/2025/hash/5434be94e82c54327bb9dcaf7fca52b6-Abstract-Conference.html|ICLR 2025 — MoA (Spotlight Paper)]] * [[https://github.com/togethercomputer/moa|Together AI — MoA GitHub Repository]] * [[https://openreview.net/forum?id=h0ZfDIrj7T|OpenReview — MoA ICLR 2025 Submission]] ===== See Also ===== * [[multi_agent_systems|Multi-Agent Systems]] * [[llm_ensembles|LLM Ensembles]] * [[chain_of_thought|Chain-of-Thought Prompting]] * [[self_evolving_agents|Self-Evolving Agents]]