====== Synthetic Data Generation Agents ====== **Synthetic Data Generation Agents** are agentic AI pipelines that autonomously create high-quality training datasets by decomposing complex data generation tasks into manageable subtasks executed by specialized LLM-based agents. The **AgentSynth** framework, published as a conference paper at **ICLR 2026**, demonstrates this approach for generating diverse computer-use task trajectories at scale. ===== Overview ===== Training capable AI agents requires large volumes of high-quality, diverse task data with corresponding trajectories. Human annotation is expensive (often hundreds of dollars per trajectory) and difficult to scale. Agentic synthetic data generation addresses this by leveraging **information asymmetry** — the principle that executing a task step-by-step is significantly easier than reasoning about the complete solution at once. By decomposing generation into forward-execution subtasks, agentic pipelines produce datasets that are simple to create but challenging to solve, providing both training data and discriminative benchmarks. ===== AgentSynth Framework ===== AgentSynth is a scalable, cost-efficient pipeline for automatically synthesizing task and trajectory datasets for generalist computer-use agents. Developed at UC Berkeley by Jingxu Xie, Dylan Xu, Xuandong Zhao, and Dawn Song. === Multi-Agent Architecture === AgentSynth deploys six distinct LLM-based agents in a coordinated pipeline: * **Task Proposer** — Generates candidate task descriptions based on available tools and environments * **Task Executor** — Attempts to execute proposed tasks, producing action trajectories * **Task Verifier** — Validates that executed trajectories correctly complete the proposed task * **Task Reviser** — Refines failed or ambiguous tasks based on execution feedback * **Follow-up Task Proposer** — Generates compositional follow-up tasks that build on completed subtasks * **Task Summarizer** — Produces clean task descriptions and metadata for the final dataset === Difficulty Modulation === A key innovation is precise control over task complexity by varying the number of composed subtasks. Each individual subtask is straightforward, but chaining them creates increasingly challenging long-horizon tasks: ^ Difficulty Level ^ Subtasks ^ Agent Success Rate ^ | Level 1 | 1 | 18% | | Level 2 | 2 | 12% | | Level 3 | 3 | 8% | | Level 6 | 6 | 4% | This steep performance degradation demonstrates the benchmark's discriminative power and highlights substantial room for agent improvement. ===== Code Example ===== Simplified agentic synthetic data generation pipeline: from dataclasses import dataclass @dataclass class SubTask: description: str tools_required: list[str] trajectory: list[dict] verified: bool = False class AgentSynthPipeline: def __init__(self, llm_client, environment): self.llm = llm_client self.env = environment def propose_subtask(self, context: dict) -> SubTask: prompt = f"Propose a simple computer task using: {context['available_tools']}" response = self.llm.generate(prompt) return SubTask( description=response.text, tools_required=response.tools, trajectory=[] ) def execute_subtask(self, subtask: SubTask) -> SubTask: trajectory = [] state = self.env.reset() for step in range(self.env.max_steps): action = self.llm.generate( f"Task: {subtask.description}\nState: {state}\nNext action:" ) state, done = self.env.step(action) trajectory.append({"state": state, "action": action}) if done: break subtask.trajectory = trajectory return subtask def verify_subtask(self, subtask: SubTask) -> bool: verification = self.llm.generate( f"Did this trajectory complete the task?\n" f"Task: {subtask.description}\n" f"Trajectory: {subtask.trajectory}" ) subtask.verified = verification.text.lower().startswith("yes") return subtask.verified def compose_tasks(self, subtasks: list[SubTask], difficulty: int) -> dict: selected = subtasks[:difficulty] composed_description = " Then, ".join(s.description for s in selected) composed_trajectory = [] for s in selected: composed_trajectory.extend(s.trajectory) return { "description": composed_description, "difficulty": difficulty, "trajectory": composed_trajectory, "num_subtasks": len(selected) } ===== Cost Efficiency ===== AgentSynth achieves an average cost of **$0.60 per trajectory**, orders of magnitude cheaper than human annotations. Over 6,000 diverse and realistic tasks were generated using the pipeline, integrated with the OSWorld environment for authentic computer tool interactions. ===== Broader Ecosystem ===== Other frameworks complement AgentSynth in the synthetic data generation space: * **NVIDIA NeMo** — Infrastructure for configuring seed datasets, column structures, and LLM-prompted generation with quality evaluation * **Tonic Fabricate** — Conversational agentic interface for natural-language dataset specification with real-time generation * **Schema-Aware Generation** — Maintaining referential integrity across related tables while generating statistically consistent synthetic records ===== References ===== * [[https://arxiv.org/abs/2506.14205|arXiv:2506.14205 — AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents]] * [[https://openreview.net/forum?id=CoBxmXThM6|OpenReview — AgentSynth (ICLR 2026)]] * [[https://github.com/sunblaze-ucb/AgentSynth|AgentSynth GitHub Repository]] * [[https://iclr.cc/virtual/2026/poster/10010827|ICLR 2026 Poster — AgentSynth]] ===== See Also ===== * [[self_evolving_agents|Self-Evolving Agents]] * [[agentic_skills|Agentic Skills]] * [[synthetic_data|Synthetic Data for AI Training]] * [[computer_use_agents|Computer Use Agents]]