====== Music Composition Agents ======
Multi-agent LLM systems for music composition deploy role-specialized agents for melody, harmony, accompaniment, and quality review, collaboratively producing symbolic music through iterative feedback loops.
===== Overview =====
Music composition requires coordinating multiple musical elements: melody, harmony, rhythm, instrumentation, and structure. Multi-agent LLM frameworks address this by assigning specialized roles to different agents, mirroring how human composers and arrangers collaborate. CoComposer(([[https://arxiv.org/abs/2509.00132|"CoComposer: Multi-Agent Collaborative Music Composition with LLMs." arXiv:2509.00132, 2025.]])) uses five role-specialized agents for iterative symbolic music creation in ABC notation, while WeaveMuse(([[https://arxiv.org/abs/2509.11183|"WeaveMuse: Open Multi-Agent Framework for Multimodal Music." arXiv:2509.11183, 2025.]])) provides an open framework for multimodal music tasks spanning text, notation, audio, and visual modalities.
===== CoComposer: Multi-Agent Collaborative Composition =====
CoComposer deploys five specialized agents using AutoGen group chats:
* **Leader Agent**: Analyzes user prompts and decomposes them into musical specifications (title, genre, key, chord progression, instruments, tempo, rhythm)
* **Melody Agent**: Generates the main melody in ABC notation with MIDI instrument info, tempo, rhythm, key, and genre parameters
* **Accompaniment Agent**: Creates harmony and supporting parts synchronized with the melody, handling chord progressions and rhythmic alignment
* **Revision Agent**: Receives feedback and applies targeted modifications to improve composition quality
* **Review Agent**: Evaluates the overall composition against quality criteria and provides structured feedback
The compositional workflow can be modeled as an iterative function:
f: P \to (M, A), \quad R(M, A) \to (M', A')
where $P$ is the prompt, $M$ is melody, $A$ is accompaniment, and $R$ is the review feedback function that drives iterative refinement.
**Streamlined Design**: CoComposer uses 5 agents (vs. ComposerX's 6), eliminating the separate Instrument agent to reduce communication rounds while maintaining quality.
===== WeaveMuse: Open Multimodal Framework =====
WeaveMuse supports multimodal music tasks through:
* **Specialist Agents**: Interpret requirements, validate outputs across formats (ABC, MIDI, audio)
* **Manager Agent**: Selects tools, sequences actions, maintains state, handles user turns
* **Intermodal Loops**: Analysis-synthesis-render cycles across text, symbolic notation, audio, and visual modalities
WeaveMuse emphasizes reproducibility with interchangeable open-source models and supports cross-format constraint validation for rhythmic and harmonic coherence.
===== Quality Evaluation =====
CoComposer is evaluated using AudioBox-Aesthetics on four criteria:
Q_{total} = w_{CE} \cdot Q_{CE} + w_{CU} \cdot Q_{CU} + w_{PC} \cdot Q_{PC} + w_{PQ} \cdot Q_{PQ}
where CE = Content Enjoyment, CU = Content Usefulness, PC = Production Complexity, PQ = Production Quality.
===== Code Example =====
from dataclasses import dataclass, field
from enum import Enum
class AgentRole(Enum):
LEADER = "leader"
MELODY = "melody"
ACCOMPANIMENT = "accompaniment"
REVISION = "revision"
REVIEW = "review"
@dataclass
class MusicSpec:
title: str
genre: str
key: str
tempo: int
time_signature: str
chord_progression: list[str] = field(default_factory=list)
instruments: list[str] = field(default_factory=list)
class CoComposerSystem:
def __init__(self, llm_model: str = "gpt-4o"):
self.agents = {
role: MusicAgent(role, llm_model)
for role in AgentRole
}
def compose(self, user_prompt: str,
max_iterations: int = 3) -> str:
spec = self.agents[AgentRole.LEADER].decompose(
user_prompt
)
melody_abc = self.agents[AgentRole.MELODY].generate(
spec
)
accomp_abc = self.agents[AgentRole.ACCOMPANIMENT].generate(
spec, melody_abc
)
for iteration in range(max_iterations):
review = self.agents[AgentRole.REVIEW].evaluate(
melody_abc, accomp_abc, spec
)
if review.score >= 0.85:
break
melody_abc, accomp_abc = (
self.agents[AgentRole.REVISION].revise(
melody_abc, accomp_abc, review.feedback
)
)
return self.merge_abc(melody_abc, accomp_abc)
def merge_abc(self, melody: str, accomp: str) -> str:
return f"X:1\n%%score 1 2\n" \
f"V:1 name=\"Melody\"\n{melody}\n" \
f"V:2 name=\"Accompaniment\"\n{accomp}"
===== Architecture =====
graph TD
A[User Prompt] --> B[Leader Agent]
B --> C[Music Specification]
C --> D[Melody Agent]
C --> E[Accompaniment Agent]
D --> F[Melody - ABC Notation]
E --> G[Accompaniment - ABC Notation]
F --> H[Review Agent]
G --> H
H --> I{Quality Threshold?}
I -->|Pass| J[Final Composition]
I -->|Fail| K[Revision Agent]
K --> L[Feedback-Driven Edits]
L --> D
L --> E
J --> M[ABC to MIDI]
M --> N[Audio Rendering]
subgraph WeaveMuse Extension
O[Manager Agent] --> P[Text Analysis]
O --> Q[Notation Validation]
O --> R[Audio Synthesis]
O --> S[Visual Score]
end
===== Key Results =====
^ Metric ^ CoComposer ^ ComposerX ^ Single Agent ^
| Generation success rate | 100% | 100% | 100% |
| Content Enjoyment (CE) | Higher | Baseline | Lower |
| Production Complexity (PC) | Higher | Baseline | Lower |
| Production Quality (PQ) | Higher | Baseline | Comparable |
| Agent count | 5 | 6 | 1 |
| Communication rounds | Fewer | More | None |
===== See Also =====
* [[image_editing_agents|Image Editing Agents]]
* [[video_editing_agents|Video Editing Agents]]
* [[game_playing_agents|Game Playing Agents]]
===== References =====