====== The Rise and Potential of Large Language Model Based Agents: A Survey ======

This landmark survey by Xi et al. (2023)(([[https://arxiv.org/abs/2309.07864|Xi et al. (2023) - The Rise and Potential of Large Language Model Based Agents: A Survey]])) from Fudan NLP Group provides the most comprehensive overview of LLM-based agents, proposing a unifying conceptual framework of **brain, perception, and action** modules.((https://arxiv.org/abs/2309.07864)) With over 1,500 citations, it is the most influential survey in the LLM agent space.

===== Overview =====

The survey traces the concept of agents from philosophical origins (Descartes, Locke, Hume) through AI history (symbolic AI, reinforcement learning) to the modern era where LLMs serve as the foundation for general-purpose agents.((https://arxiv.org/abs/2309.07864)) The central thesis: LLMs possess the versatile capabilities needed to serve as a **starting point for designing AI agents that can adapt to diverse scenarios**.

Published in Science China Information Sciences (2025)(([[https://doi.org/10.1007/s11432-024-4222-0|Published in Science China Information Sciences, 2025]])), the paper covers single-agent systems, multi-agent cooperation, and human-agent interaction.

===== The Brain-Perception-Action Framework =====

<mermaid>
graph TD
    subgraph Brain
        B1[Natural Language Understanding]
        B2[Knowledge & Memory]
        B3[Reasoning & Planning]
        B4[Transferability & Generalization]
    end
    subgraph Perception
        P1[Textual Input]
        P2[Visual Input]
        P3[Auditory Input]
        P4[Other Modalities]
    end
    subgraph Action
        A1[Textual Output]
        A2[Tool Use]
        A3[Embodied Action]
    end
    P1 --> B1
    P2 --> B1
    P3 --> B1
    P4 --> B1
    B3 --> A1
    B3 --> A2
    B3 --> A3
    B2 --> B3
    B1 --> B3
</mermaid>

===== Brain Module =====

The brain is the LLM itself, providing core cognitive functions:((https://arxiv.org/abs/2309.07864))

  * **Natural Language Understanding**: Processing and interpreting inputs
  * **Knowledge**: World knowledge encoded in parameters plus external retrieval
  * **Memory**: Short-term (in-context) and long-term (external storage) memory systems
  * **Reasoning**: Chain-of-thought, decomposition, and multi-step inference
  * **Planning**: Task decomposition, plan generation, and plan refinement
  * **Transferability**: Adapting to new tasks and domains with minimal examples

The agent's decision at each step can be formalized as:

<latex>a_t = \pi_\theta(o_t, m_t, g)</latex>

where <latex>\pi_\theta</latex> is the LLM-based policy, <latex>o_t</latex> is the current observation, <latex>m_t</latex> is the memory state, and <latex>g</latex> is the goal.

===== Perception Module =====

Perception extends the agent beyond text:

  * **Textual**: Natural language instructions, documents, API responses
  * **Visual**: Images and video via multimodal LLMs or vision encoders
  * **Auditory**: Speech and sound through audio models
  * **Other Modalities**: Sensor data, structured data, embodied observations

===== Action Module =====

Actions are the agent's interface with the world:

  * **Textual Output**: Dialogue, summarization, code generation
  * **Tool Use**: API calls, code execution, database queries
  * **Embodied Action**: Physical manipulation in robotics or virtual environments

===== Agent Application Taxonomy =====

The survey categorizes agent applications into three paradigms:((https://arxiv.org/abs/2309.07864))

^ Paradigm ^ Description ^ Examples ^
| Single Agent | One LLM agent solving tasks autonomously | AutoGPT, HuggingGPT, WebGPT |
| Multi-Agent | Multiple agents cooperating or competing | Generative Agents, CAMEL, AgentVerse |
| Human-Agent | Collaboration between humans and LLM agents | Copilot, interactive assistants |

The survey further examines agent societies, covering:
  * Cooperative vs. competitive dynamics
  * Communication protocols between agents
  * Emergent social behaviors
  * Simulation of human social systems

===== Key Contributions =====

  * **Unified framework**: Brain/Perception/Action provides a systematic lens for analyzing any LLM agent
  * **Comprehensive taxonomy**: Covers 100+ papers across agent construction, applications, and evaluation
  * **Historical context**: Traces agents from philosophy through classical AI to LLM era
  * **Research roadmap**: Identifies open challenges including robustness, safety, and evaluation
  * **1,500+ citations**: Most-cited survey in the LLM agent field(([[https://github.com/WooooDyy/LLM-Agent-Paper-List|Companion Paper List Repository]]))((https://arxiv.org/abs/2309.07864))

===== Code Example =====

<code python>
# Conceptual implementation of the Brain-Perception-Action framework
class LLMAgent:
    def __init__(self, llm, tools, memory_store):
        self.brain = BrainModule(llm)
        self.perception = PerceptionModule(modalities=['text', 'vision'])
        self.action = ActionModule(tools=tools)
        self.memory = MemoryModule(memory_store)

    def step(self, observation, goal):
        # Perception: process multimodal input
        processed_obs = self.perception.process(observation)

        # Brain: reason and plan with memory context
        memory_context = self.memory.retrieve(processed_obs)
        plan = self.brain.reason(processed_obs, memory_context, goal)

        # Action: execute the plan
        result = self.action.execute(plan)

        # Update memory
        self.memory.store(processed_obs, plan, result)
        return result
</code>

===== See Also =====

  * [[agent_survey_comparison|Comparison of LLM Agent Surveys]]
  * [[autogpt|AutoGPT: Autonomous Agents]]
  * [[generative_agents|Generative Agents]]
  * [[agenttuning|AgentTuning: Instruction-Tuning for Agent Abilities]]

===== References =====