This is an old revision of the document!

Music Composition Agents

Multi-agent LLM systems for music composition deploy role-specialized agents for melody, harmony, accompaniment, and quality review, collaboratively producing symbolic music through iterative feedback loops.

Overview

Music composition requires coordinating multiple musical elements: melody, harmony, rhythm, instrumentation, and structure. Multi-agent LLM frameworks address this by assigning specialized roles to different agents, mirroring how human composers and arrangers collaborate. CoComposer¹⁾ uses five role-specialized agents for iterative symbolic music creation in ABC notation, while WeaveMuse²⁾ provides an open framework for multimodal music tasks spanning text, notation, audio, and visual modalities.

CoComposer: Multi-Agent Collaborative Composition

CoComposer deploys five specialized agents using AutoGen group chats:

Leader Agent: Analyzes user prompts and decomposes them into musical specifications (title, genre, key, chord progression, instruments, tempo, rhythm)
Melody Agent: Generates the main melody in ABC notation with MIDI instrument info, tempo, rhythm, key, and genre parameters
Accompaniment Agent: Creates harmony and supporting parts synchronized with the melody, handling chord progressions and rhythmic alignment
Revision Agent: Receives feedback and applies targeted modifications to improve composition quality
Review Agent: Evaluates the overall composition against quality criteria and provides structured feedback

The compositional workflow can be modeled as an iterative function:

where $P$ is the prompt, $M$ is melody, $A$ is accompaniment, and $R$ is the review feedback function that drives iterative refinement.

Streamlined Design: CoComposer uses 5 agents (vs. ComposerX's 6), eliminating the separate Instrument agent to reduce communication rounds while maintaining quality.

WeaveMuse: Open Multimodal Framework

WeaveMuse supports multimodal music tasks through:

Specialist Agents: Interpret requirements, validate outputs across formats (ABC, MIDI, audio)
Manager Agent: Selects tools, sequences actions, maintains state, handles user turns
Intermodal Loops: Analysis-synthesis-render cycles across text, symbolic notation, audio, and visual modalities

WeaveMuse emphasizes reproducibility with interchangeable open-source models and supports cross-format constraint validation for rhythmic and harmonic coherence.

Quality Evaluation

CoComposer is evaluated using AudioBox-Aesthetics on four criteria:

<latex>Q_{total} = w_{CE} \cdot Q_{CE} + w_{CU} \cdot Q_{CU} + w_{PC} \cdot Q_{PC} + w_{PQ} \cdot Q_{PQ}</latex>

where CE = Content Enjoyment, CU = Content Usefulness, PC = Production Complexity, PQ = Production Quality.

Code Example

from dataclasses import dataclass, field
from enum import Enum
 
class AgentRole(Enum):
    LEADER = "leader"
    MELODY = "melody"
    ACCOMPANIMENT = "accompaniment"
    REVISION = "revision"
    REVIEW = "review"
 
@dataclass
class MusicSpec:
    title: str
    genre: str
    key: str
    tempo: int
    time_signature: str
    chord_progression: list[str] = field(default_factory=list)
    instruments: list[str] = field(default_factory=list)
 
class CoComposerSystem:
    def __init__(self, llm_model: str = "gpt-4o"):
        self.agents = {
            role: MusicAgent(role, llm_model)
            for role in AgentRole
        }
 
    def compose(self, user_prompt: str,
                max_iterations: int = 3) -> str:
        spec = self.agents[AgentRole.LEADER].decompose(
            user_prompt
        )
        melody_abc = self.agents[AgentRole.MELODY].generate(
            spec
        )
        accomp_abc = self.agents[AgentRole.ACCOMPANIMENT].generate(
            spec, melody_abc
        )
        for iteration in range(max_iterations):
            review = self.agents[AgentRole.REVIEW].evaluate(
                melody_abc, accomp_abc, spec
            )
            if review.score >= 0.85:
                break
            melody_abc, accomp_abc = (
                self.agents[AgentRole.REVISION].revise(
                    melody_abc, accomp_abc, review.feedback
                )
            )
        return self.merge_abc(melody_abc, accomp_abc)
 
    def merge_abc(self, melody: str, accomp: str) -> str:
        return f"X:1\n%%score 1 2\n" \
               f"V:1 name=\"Melody\"\n{melody}\n" \
               f"V:2 name=\"Accompaniment\"\n{accomp}"

Architecture

graph TD A[User Prompt] --> B[Leader Agent] B --> C[Music Specification] C --> D[Melody Agent] C --> E[Accompaniment Agent] D --> F[Melody - ABC Notation] E --> G[Accompaniment - ABC Notation] F --> H[Review Agent] G --> H H --> I{Quality Threshold?} I -->|Pass| J[Final Composition] I -->|Fail| K[Revision Agent] K --> L[Feedback-Driven Edits] L --> D L --> E J --> M[ABC to MIDI] M --> N[Audio Rendering] subgraph WeaveMuse Extension O[Manager Agent] --> P[Text Analysis] O --> Q[Notation Validation] O --> R[Audio Synthesis] O --> S[Visual Score] end

Key Results

Metric	CoComposer	ComposerX	Single Agent
Generation success rate	100%	100%	100%
Content Enjoyment (CE)	Higher	Baseline	Lower
Production Complexity (PC)	Higher	Baseline	Lower
Production Quality (PQ)	Higher	Baseline	Comparable
Agent count	5	6	1
Communication rounds	Fewer	More	None

AI Agent Knowledge Base

Sidebar

Table of Contents

Music Composition Agents

Overview

CoComposer: Multi-Agent Collaborative Composition

WeaveMuse: Open Multimodal Framework

Quality Evaluation

Code Example

Architecture

Key Results

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Music Composition Agents

Overview

CoComposer: Multi-Agent Collaborative Composition

WeaveMuse: Open Multimodal Framework

Quality Evaluation

Code Example

Architecture

Key Results

References

See Also

Page Tools