This is an old revision of the document!

Cognitive Architectures for Language Agents (CoALA)

The Cognitive Architectures for Language Agents (CoALA) framework, introduced by Sumers et al. (2023), provides a systematic taxonomy for organizing and understanding language agents built on large language models. Drawing deeply from cognitive science and classical AI architectures such as Soar, CoALA decomposes agents into modular components: memory systems, action spaces, and decision-making procedures. This principled decomposition enables both retrospective analysis of existing agents (e.g., ReAct, Reflexion) and prospective identification of underexplored design dimensions.

Formal Agent Definition

CoALA formalizes a language agent as a tuple:

$$A = (M_w, M_{lt}, \mathcal{A}_i, \mathcal{A}_e, D)$$

where $M_w$ denotes working memory, $M_{lt}$ long-term memory, $\mathcal{A}_i$ internal actions, $\mathcal{A}_e$ external actions, and $D$ the decision-making procedure. The LLM serves as the core computational engine that processes and transforms information across these components.

Memory Systems

CoALA divides agent memory into short-term and long-term stores, mirroring distinctions from cognitive psychology:

Working Memory ($M_w$): A short-term scratchpad holding the agent's current context — recent observations, partial plans, and intermediate reasoning outputs. This is analogous to the limited-capacity buffer in human cognition.

Long-Term Memory ($M_{lt}$) is subdivided into three modules:

Episodic Memory: Stores records of past experiences and events (e.g., “What happened when I tried approach X last time?”). Enables learning from history.
Semantic Memory: Holds factual world knowledge in structured or unstructured form (e.g., knowledge graphs, document stores). Provides the agent's understanding of the world.
Procedural Memory: Encodes skills and procedures, often as executable code or as knowledge implicitly stored in LLM parameters. Governs how the agent acts.

Action Spaces

Actions are partitioned into internal operations on memory and external interactions with the environment:

Internal Actions ($\mathcal{A}_i$):

Retrieval: Reading from long-term memory into working memory
Reasoning: Updating working memory via LLM inference (chain-of-thought, planning)
Learning: Writing new information from working memory into long-term memory

External Actions ($\mathcal{A}_e$):

Grounding: Interacting with the outside world — tool use, API calls, web browsing, robotic control

Decision-Making Procedure

The decision procedure $D$ operates as a continuous loop with two alternating stages:

Planning Stage: Iteratively applies reasoning and retrieval actions to propose, evaluate, and select a grounding or learning action. The LLM generates candidate plans, retrieves relevant memories, and scores alternatives.
Execution Stage: Performs the selected external action, receives environmental feedback, updates working memory with new observations, and re-enters the planning stage.

This creates the characteristic perceive-think-act cycle that distinguishes agents from single-pass LLM inference.

# Simplified CoALA decision loop
class CoALAAgent:
    def __init__(self, llm, episodic_mem, semantic_mem, procedural_mem):
        self.llm = llm
        self.working_memory = []
        self.episodic = episodic_mem
        self.semantic = semantic_mem
        self.procedural = procedural_mem
 
    def step(self, observation):
        self.working_memory.append(observation)
        # Internal actions: retrieval + reasoning
        context = self.semantic.retrieve(observation)
        past = self.episodic.retrieve(observation)
        self.working_memory.extend([context, past])
        # Decision via LLM reasoning
        plan = self.llm.reason(self.working_memory)
        # External action: grounding
        result = self.execute(plan.action)
        # Learning: store experience
        self.episodic.store(observation, plan, result)
        return result

Connections to Cognitive Science

CoALA explicitly builds on classical cognitive architectures, particularly Soar (Laird, Newell & Rosenbloom, 1987), where production rules in long-term memory match against working memory to fire actions. CoALA replaces rigid production rules with flexible LLM-based reasoning while retaining the modular memory-action-decision structure. The framework also draws on Tulving's (1972) distinction between episodic and semantic memory and Anderson's (1983) ACT-R model of procedural knowledge.

This grounding in cognitive science enables CoALA to serve as a bridge between decades of cognitive architecture research and the emerging field of LLM-based agents, providing a common vocabulary for comparing disparate systems.

Key Insights

The survey of existing agents through the CoALA lens reveals that most current systems underutilize long-term memory — particularly episodic and procedural memory. Many agents rely solely on in-context learning (working memory) without persistent storage, limiting their ability to improve over time.

AI Agent Knowledge Base

Sidebar

Table of Contents

Cognitive Architectures for Language Agents (CoALA)

Formal Agent Definition

Memory Systems

Action Spaces

Decision-Making Procedure

Connections to Cognitive Science

Key Insights

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Cognitive Architectures for Language Agents (CoALA)

Formal Agent Definition

Memory Systems

Action Spaces

Decision-Making Procedure

Connections to Cognitive Science

Key Insights

References

See Also

Page Tools