Cognitive Architectures for Language Agents (CoALA)

The Cognitive Architectures for Language Agents (CoALA) framework, proposed by Sumers et al. (2023), provides a systematic taxonomy for organizing LLM-based language agents into modular components inspired by cognitive science¹⁾. Drawing on decades of research in cognitive architectures such as Soar and ACT-R, CoALA formalizes the design space of language agents through memory modules, structured action spaces, and decision-making procedures.

Overview

As language model-based agents proliferate, from ReAct to Reflexion to Voyager, the field lacks a unifying framework to compare, categorize, and design them. CoALA addresses this by proposing a modular architecture that retrospectively organizes existing agents and prospectively identifies gaps in the design space. The framework defines an agent as a tuple:

$$A = (M_w, M_{lt}, \mathcal{A}_i, \mathcal{A}_e, D)$$

where $M_w$ is working memory, $M_{lt}$ is long-term memory, $\mathcal{A}_i$ is the internal action space, $\mathcal{A}_e$ is the external action space, and $D$ is the decision procedure.

Memory Modules

CoALA divides agent memory into working memory and three types of long-term memory, mirroring distinctions from cognitive psychology:

Working Memory: A short-term scratchpad holding the agent's current context, recent observations, intermediate reasoning results, and partial plans. Analogous to the limited-capacity buffer in human cognition.
Episodic Memory: Stores past experiences and events (e.g., “What happened when I tried approach X?”). Enables learning from specific interaction histories.
Semantic Memory: Holds factual world knowledge (e.g., “Water boils at 100°C at sea level”). Can be stored as text, embeddings, or knowledge graphs.
Procedural Memory: Encodes skills and procedures, often represented as code snippets, tool definitions, or implicitly within LLM parameters. Defines how to perform actions.

Action Spaces

Actions are partitioned into internal and external categories:

Internal Actions

Retrieval: Reading from long-term memory stores
Reasoning: Updating working memory via LLM inference (chain-of-thought, reflection)²⁾
Learning: Writing new information to long-term memory

External Actions

Grounding: Interacting with the outside world, tool use, API calls, web browsing, or robotic control³⁾

# Simplified CoALA [[agent_loop|agent loop]]
class CoALAAgent:
    def __init__(self, llm, episodic_mem, semantic_mem, procedural_mem):
        self.llm = llm
        self.working_memory = []
        self.episodic = episodic_mem
        self.semantic = semantic_mem
        self.procedural = procedural_mem
 
    def decision_loop(self, observation):
        self.working_memory.append(observation)
        while not self.should_act_externally():
            # Internal actions: retrieve, reason, learn
            retrieved = self.retrieve(self.working_memory)
            reasoning = self.llm.reason(self.working_memory + retrieved)
            self.working_memory.append(reasoning)
        action = self.select_external_action(self.working_memory)
        result = self.execute(action)
        self.episodic.store(observation, action, result)
        return result

Decision Procedures

CoALA formalizes decision-making as a continuous loop with two stages:

Planning Stage: The agent iteratively applies reasoning and retrieval to propose, evaluate, and select actions. This may involve multi-step deliberation or simple reactive mappings.
Execution Stage: The selected action is performed (grounding or learning), the environment returns new observations, and the cycle repeats.

This distinguishes agents on a spectrum from purely reactive (single LLM call maps observation to action) to deliberative (multi-step internal planning before acting).

Connections to Cognitive Science

CoALA explicitly builds on classical cognitive architectures:

Soar: Production rules in long-term memory match working memory contents to trigger actions. CoALA replaces symbolic productions with LLM-based reasoning.
ACT-R: Distinguishes declarative and procedural memory with activation-based retrieval. CoALA's memory taxonomy mirrors this structure.
Global Workspace Theory: Working memory serves as a shared workspace where different modules contribute and compete for attention.

The framework positions LLM agents within a 50-year lineage of AI research, arguing that cognitive architectures provide the missing organizational structure for the rapidly expanding space of language agents.

References

¹⁾

Sumers et al. "Cognitive Architectures for Language Agents" (2023

²⁾

Shinn et al. "Reflexion: Language Agents with Verbal Reinforcement Learning" (2023

³⁾

Yao et al. "ReAct: Synergizing Reasoning and Acting in Language Models" (2023

AI Agent Knowledge Base

Sidebar

Table of Contents

Cognitive Architectures for Language Agents (CoALA)

Overview

Memory Modules

Action Spaces

Internal Actions

External Actions

Decision Procedures

Connections to Cognitive Science

See Also

References

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Cognitive Architectures for Language Agents (CoALA)

Overview

Memory Modules

Action Spaces

Internal Actions

External Actions

Decision Procedures

Connections to Cognitive Science

See Also

References

Page Tools