Overview
The Brain-Perception-Action Framework
Brain Module
Perception Module
Action Module
Agent Application Taxonomy
Key Contributions
Code Example
See Also
References

The Rise and Potential of Large Language Model Based Agents: A Survey

This landmark survey by Xi et al. (2023)¹⁾ from Fudan NLP Group provides the most comprehensive overview of LLM-based agents, proposing a unifying conceptual framework of brain, perception, and action modules.²⁾ With over 1,500 citations, it is the most influential survey in the LLM agent space.

Overview

The survey traces the concept of agents from philosophical origins (Descartes, Locke, Hume) through AI history (symbolic AI, reinforcement learning) to the modern era where LLMs serve as the foundation for general-purpose agents.³⁾ The central thesis: LLMs possess the versatile capabilities needed to serve as a starting point for designing AI agents that can adapt to diverse scenarios.

Published in Science China Information Sciences (2025)⁴⁾, the paper covers single-agent systems, multi-agent cooperation, and human-agent interaction.

The Brain-Perception-Action Framework

graph TD subgraph Brain B1[Natural Language Understanding] B2[Knowledge & Memory] B3[Reasoning & Planning] B4[Transferability & Generalization] end subgraph Perception P1[Textual Input] P2[Visual Input] P3[Auditory Input] P4[Other Modalities] end subgraph Action A1[Textual Output] A2[Tool Use] A3[Embodied Action] end P1 --> B1 P2 --> B1 P3 --> B1 P4 --> B1 B3 --> A1 B3 --> A2 B3 --> A3 B2 --> B3 B1 --> B3

Brain Module

The brain is the LLM itself, providing core cognitive functions:⁵⁾

Natural Language Understanding: Processing and interpreting inputs
Knowledge: World knowledge encoded in parameters plus external retrieval
Memory: Short-term (in-context) and long-term (external storage) memory systems
Reasoning: Chain-of-thought, decomposition, and multi-step inference
Planning: Task decomposition, plan generation, and plan refinement
Transferability: Adapting to new tasks and domains with minimal examples

The agent's decision at each step can be formalized as:

<latex>a_t = \pi_\theta(o_t, m_t, g)</latex>

where <latex>\pi_\theta</latex> is the LLM-based policy, <latex>o_t</latex> is the current observation, <latex>m_t</latex> is the memory state, and <latex>g</latex> is the goal.

Perception Module

Perception extends the agent beyond text:

Textual: Natural language instructions, documents, API responses
Visual: Images and video via multimodal LLMs or vision encoders
Auditory: Speech and sound through audio models
Other Modalities: Sensor data, structured data, embodied observations

Action Module

Actions are the agent's interface with the world:

Textual Output: Dialogue, summarization, code generation
Tool Use: API calls, code execution, database queries
Embodied Action: Physical manipulation in robotics or virtual environments

Agent Application Taxonomy

The survey categorizes agent applications into three paradigms:⁶⁾

Paradigm	Description	Examples
Single Agent	One LLM agent solving tasks autonomously	AutoGPT, HuggingGPT, WebGPT
Multi-Agent	Multiple agents cooperating or competing	Generative Agents, CAMEL, AgentVerse
Human-Agent	Collaboration between humans and LLM agents	Copilot, interactive assistants

The survey further examines agent societies, covering:

Cooperative vs. competitive dynamics
Communication protocols between agents
Emergent social behaviors
Simulation of human social systems

Key Contributions

Unified framework: Brain/Perception/Action provides a systematic lens for analyzing any LLM agent
Comprehensive taxonomy: Covers 100+ papers across agent construction, applications, and evaluation
Historical context: Traces agents from philosophy through classical AI to LLM era
Research roadmap: Identifies open challenges including robustness, safety, and evaluation
1,500+ citations: Most-cited survey in the LLM agent field⁷⁾⁸⁾

Code Example

# Conceptual implementation of the Brain-Perception-Action framework
class LLMAgent:
    def __init__(self, llm, tools, memory_store):
        self.brain = BrainModule(llm)
        self.perception = PerceptionModule(modalities=['text', 'vision'])
        self.action = ActionModule(tools=tools)
        self.memory = MemoryModule(memory_store)
 
    def step(self, observation, goal):
        # Perception: process multimodal input
        processed_obs = self.perception.process(observation)
 
        # Brain: reason and plan with memory context
        memory_context = self.memory.retrieve(processed_obs)
        plan = self.brain.reason(processed_obs, memory_context, goal)
 
        # Action: execute the plan
        result = self.action.execute(plan)
 
        # Update memory
        self.memory.store(processed_obs, plan, result)
        return result