Table of Contents

The Rise and Potential of Large Language Model Based Agents: A Survey

This landmark survey by Xi et al. (2023)1) from Fudan NLP Group provides the most comprehensive overview of LLM-based agents, proposing a unifying conceptual framework of brain, perception, and action modules.2) With over 1,500 citations, it is the most influential survey in the LLM agent space.

Overview

The survey traces the concept of agents from philosophical origins (Descartes, Locke, Hume) through AI history (symbolic AI, reinforcement learning) to the modern era where LLMs serve as the foundation for general-purpose agents.3) The central thesis: LLMs possess the versatile capabilities needed to serve as a starting point for designing AI agents that can adapt to diverse scenarios.

Published in Science China Information Sciences (2025)4), the paper covers single-agent systems, multi-agent cooperation, and human-agent interaction.

The Brain-Perception-Action Framework

graph TD subgraph Brain B1[Natural Language Understanding] B2[Knowledge & Memory] B3[Reasoning & Planning] B4[Transferability & Generalization] end subgraph Perception P1[Textual Input] P2[Visual Input] P3[Auditory Input] P4[Other Modalities] end subgraph Action A1[Textual Output] A2[Tool Use] A3[Embodied Action] end P1 --> B1 P2 --> B1 P3 --> B1 P4 --> B1 B3 --> A1 B3 --> A2 B3 --> A3 B2 --> B3 B1 --> B3

Brain Module

The brain is the LLM itself, providing core cognitive functions:5)

The agent's decision at each step can be formalized as:

<latex>a_t = \pi_\theta(o_t, m_t, g)</latex>

where <latex>\pi_\theta</latex> is the LLM-based policy, <latex>o_t</latex> is the current observation, <latex>m_t</latex> is the memory state, and <latex>g</latex> is the goal.

Perception Module

Perception extends the agent beyond text:

Action Module

Actions are the agent's interface with the world:

Agent Application Taxonomy

The survey categorizes agent applications into three paradigms:6)

Paradigm Description Examples
Single Agent One LLM agent solving tasks autonomously AutoGPT, HuggingGPT, WebGPT
Multi-Agent Multiple agents cooperating or competing Generative Agents, CAMEL, AgentVerse
Human-Agent Collaboration between humans and LLM agents Copilot, interactive assistants

The survey further examines agent societies, covering:

Key Contributions

Code Example

# Conceptual implementation of the Brain-Perception-Action framework
class LLMAgent:
    def __init__(self, llm, tools, memory_store):
        self.brain = BrainModule(llm)
        self.perception = PerceptionModule(modalities=['text', 'vision'])
        self.action = ActionModule(tools=tools)
        self.memory = MemoryModule(memory_store)
 
    def step(self, observation, goal):
        # Perception: process multimodal input
        processed_obs = self.perception.process(observation)
 
        # Brain: reason and plan with memory context
        memory_context = self.memory.retrieve(processed_obs)
        plan = self.brain.reason(processed_obs, memory_context, goal)
 
        # Action: execute the plan
        result = self.action.execute(plan)
 
        # Update memory
        self.memory.store(processed_obs, plan, result)
        return result

See Also

References