====== Music Composition Agents ====== Multi-agent LLM systems for music composition deploy role-specialized agents for melody, harmony, accompaniment, and quality review, collaboratively producing symbolic music through iterative feedback loops. ===== Overview ===== Music composition requires coordinating multiple musical elements: melody, harmony, rhythm, instrumentation, and structure. Multi-agent LLM frameworks address this by assigning specialized roles to different agents, mirroring how human composers and arrangers collaborate. CoComposer(([[https://arxiv.org/abs/2509.00132|"CoComposer: Multi-Agent Collaborative Music Composition with LLMs." arXiv:2509.00132, 2025.]])) uses five role-specialized agents for iterative symbolic music creation in ABC notation, while WeaveMuse(([[https://arxiv.org/abs/2509.11183|"WeaveMuse: Open Multi-Agent Framework for Multimodal Music." arXiv:2509.11183, 2025.]])) provides an open framework for multimodal music tasks spanning text, notation, audio, and visual modalities. ===== CoComposer: Multi-Agent Collaborative Composition ===== CoComposer deploys five specialized agents using AutoGen group chats: * **Leader Agent**: Analyzes user prompts and decomposes them into musical specifications (title, genre, key, chord progression, instruments, tempo, rhythm) * **Melody Agent**: Generates the main melody in ABC notation with MIDI instrument info, tempo, rhythm, key, and genre parameters * **Accompaniment Agent**: Creates harmony and supporting parts synchronized with the melody, handling chord progressions and rhythmic alignment * **Revision Agent**: Receives feedback and applies targeted modifications to improve composition quality * **Review Agent**: Evaluates the overall composition against quality criteria and provides structured feedback The compositional workflow can be modeled as an iterative function: f: P \to (M, A), \quad R(M, A) \to (M', A') where $P$ is the prompt, $M$ is melody, $A$ is accompaniment, and $R$ is the review feedback function that drives iterative refinement. **Streamlined Design**: CoComposer uses 5 agents (vs. ComposerX's 6), eliminating the separate Instrument agent to reduce communication rounds while maintaining quality. ===== WeaveMuse: Open Multimodal Framework ===== WeaveMuse supports multimodal music tasks through: * **Specialist Agents**: Interpret requirements, validate outputs across formats (ABC, MIDI, audio) * **Manager Agent**: Selects tools, sequences actions, maintains state, handles user turns * **Intermodal Loops**: Analysis-synthesis-render cycles across text, symbolic notation, audio, and visual modalities WeaveMuse emphasizes reproducibility with interchangeable open-source models and supports cross-format constraint validation for rhythmic and harmonic coherence. ===== Quality Evaluation ===== CoComposer is evaluated using AudioBox-Aesthetics on four criteria: Q_{total} = w_{CE} \cdot Q_{CE} + w_{CU} \cdot Q_{CU} + w_{PC} \cdot Q_{PC} + w_{PQ} \cdot Q_{PQ} where CE = Content Enjoyment, CU = Content Usefulness, PC = Production Complexity, PQ = Production Quality. ===== Code Example ===== from dataclasses import dataclass, field from enum import Enum class AgentRole(Enum): LEADER = "leader" MELODY = "melody" ACCOMPANIMENT = "accompaniment" REVISION = "revision" REVIEW = "review" @dataclass class MusicSpec: title: str genre: str key: str tempo: int time_signature: str chord_progression: list[str] = field(default_factory=list) instruments: list[str] = field(default_factory=list) class CoComposerSystem: def __init__(self, llm_model: str = "gpt-4o"): self.agents = { role: MusicAgent(role, llm_model) for role in AgentRole } def compose(self, user_prompt: str, max_iterations: int = 3) -> str: spec = self.agents[AgentRole.LEADER].decompose( user_prompt ) melody_abc = self.agents[AgentRole.MELODY].generate( spec ) accomp_abc = self.agents[AgentRole.ACCOMPANIMENT].generate( spec, melody_abc ) for iteration in range(max_iterations): review = self.agents[AgentRole.REVIEW].evaluate( melody_abc, accomp_abc, spec ) if review.score >= 0.85: break melody_abc, accomp_abc = ( self.agents[AgentRole.REVISION].revise( melody_abc, accomp_abc, review.feedback ) ) return self.merge_abc(melody_abc, accomp_abc) def merge_abc(self, melody: str, accomp: str) -> str: return f"X:1\n%%score 1 2\n" \ f"V:1 name=\"Melody\"\n{melody}\n" \ f"V:2 name=\"Accompaniment\"\n{accomp}" ===== Architecture ===== graph TD A[User Prompt] --> B[Leader Agent] B --> C[Music Specification] C --> D[Melody Agent] C --> E[Accompaniment Agent] D --> F[Melody - ABC Notation] E --> G[Accompaniment - ABC Notation] F --> H[Review Agent] G --> H H --> I{Quality Threshold?} I -->|Pass| J[Final Composition] I -->|Fail| K[Revision Agent] K --> L[Feedback-Driven Edits] L --> D L --> E J --> M[ABC to MIDI] M --> N[Audio Rendering] subgraph WeaveMuse Extension O[Manager Agent] --> P[Text Analysis] O --> Q[Notation Validation] O --> R[Audio Synthesis] O --> S[Visual Score] end ===== Key Results ===== ^ Metric ^ CoComposer ^ ComposerX ^ Single Agent ^ | Generation success rate | 100% | 100% | 100% | | Content Enjoyment (CE) | Higher | Baseline | Lower | | Production Complexity (PC) | Higher | Baseline | Lower | | Production Quality (PQ) | Higher | Baseline | Comparable | | Agent count | 5 | 6 | 1 | | Communication rounds | Fewer | More | None | ===== See Also ===== * [[image_editing_agents|Image Editing Agents]] * [[video_editing_agents|Video Editing Agents]] * [[game_playing_agents|Game Playing Agents]] ===== References =====