Self-Evolving Agents

Self-Evolving Agents are AI agent systems that autonomously accumulate, refine, and reuse executable capabilities from task experience. The AgentFactory framework, introduced in arXiv:2603.18000, formalizes this paradigm through a three-phase lifecycle where agents decompose tasks into reusable sub-agents stored as executable Python code, enabling continuous capability growth without human intervention.

Overview

Traditional LLM-based agents treat each task independently, losing accumulated knowledge between sessions. Reflection-based approaches store textual summaries of past experiences but cannot reliably re-execute learned procedures. Self-evolving agents solve this by storing learned capabilities as executable code — specifically, standalone Python modules with standardized documentation that can be retrieved, tested, modified, and deployed across systems.

The key insight is that code-based skill storage provides deterministic re-execution in complex scenarios, unlike text-based methods that depend on the LLM correctly interpreting and re-implementing strategies from natural language descriptions.

AgentFactory Architecture

AgentFactory consists of three core components:

Meta-Agent Orchestrator — Oversees task decomposition, sub-agent retrieval and reuse, execution feedback analysis, and code modification
Skill System — Manages the persistent library of saved, executable sub-agents with standardized interfaces
Workspace Manager — Provides isolated directories for safe task execution, code testing, and promotion of improved sub-agents to the library

Three-Phase Lifecycle

1. Install Phase

When encountering a new type of task, the Meta-Agent decomposes it into reusable subtasks (e.g., scheduling meetings, file manipulation, data parsing). For each subtask, it generates a specialized sub-agent as executable Python code with documentation. Successful sub-agents are saved to the persistent skill library.

2. Self-Evolve Phase

For similar subsequent tasks, the system retrieves relevant saved sub-agents. If a sub-agent fails or underperforms:

The Meta-Agent analyzes execution feedback and error traces
Generates targeted code improvements
Modifies the sub-agent code
Validates against the current task in an isolated workspace
Promotes the improved version to the library

This “generate-feedback-modify” loop, inspired by Self-Refine but applied to executable code, makes sub-agents progressively more robust and general-purpose.

3. Deploy Phase

Mature sub-agents are exported as standalone Python modules, portable across any Python-capable AI system. This enables cross-system reuse — a sub-agent refined on one platform can be deployed to another without modification.

Code Example

Structure of an AgentFactory sub-agent:

"""
Sub-Agent: CSVDataParser
Description: Robust CSV parsing with automatic encoding detection,
             delimiter inference, and error recovery.
Version: 3.2 (evolved from v1.0 through 4 refinement cycles)
Dependencies: pandas, chardet
"""
import pandas as pd
import chardet
 
class CSVDataParser:
    def __init__(self):
        self.supported_delimiters = [",", "\t", ";", "|"]
 
    def detect_encoding(self, file_path: str) -> str:
        with open(file_path, "rb") as f:
            result = chardet.detect(f.read(10000))
        return result["encoding"]
 
    def infer_delimiter(self, file_path: str, encoding: str) -> str:
        with open(file_path, "r", encoding=encoding) as f:
            sample = f.read(5000)
        return max(self.supported_delimiters, key=lambda d: sample.count(d))
 
    def parse(self, file_path: str) -> pd.DataFrame:
        encoding = self.detect_encoding(file_path)
        delimiter = self.infer_delimiter(file_path, encoding)
        return pd.read_csv(
            file_path,
            encoding=encoding,
            delimiter=delimiter,
            on_bad_lines="warn"
        )
 
# Meta-agent interface
def execute(file_path: str) -> dict:
    parser = CSVDataParser()
    df = parser.parse(file_path)
    return {"rows": len(df), "columns": list(df.columns), "data": df}

Key Results

Continuous Improvement — Sub-agents evolve from fragile initial implementations to production-grade code through iterative feedback cycles
Efficiency Gains — Reusing sub-agents significantly reduces computation for recurring task patterns; the library grows autonomously
Cross-System Portability — Sub-agents transfer across different AI systems as pure Python modules
Failure Isolation — Workspace Manager ensures that failed experiments cannot corrupt the skill library

Table of Contents