Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
The Cognitive Architectures for Language Agents (CoALA) framework, introduced by Sumers et al. (2023), provides a systematic taxonomy for organizing and understanding language agents built on large language models. Drawing deeply from cognitive science and classical AI architectures such as Soar, CoALA decomposes agents into modular components: memory systems, action spaces, and decision-making procedures. This principled decomposition enables both retrospective analysis of existing agents (e.g., ReAct, Reflexion) and prospective identification of underexplored design dimensions.
CoALA formalizes a language agent as a tuple:
$$A = (M_w, M_{lt}, \mathcal{A}_i, \mathcal{A}_e, D)$$
where $M_w$ denotes working memory, $M_{lt}$ long-term memory, $\mathcal{A}_i$ internal actions, $\mathcal{A}_e$ external actions, and $D$ the decision-making procedure. The LLM serves as the core computational engine that processes and transforms information across these components.
CoALA divides agent memory into short-term and long-term stores, mirroring distinctions from cognitive psychology:
Working Memory ($M_w$): A short-term scratchpad holding the agent's current context — recent observations, partial plans, and intermediate reasoning outputs. This is analogous to the limited-capacity buffer in human cognition.
Long-Term Memory ($M_{lt}$) is subdivided into three modules:
Actions are partitioned into internal operations on memory and external interactions with the environment:
Internal Actions ($\mathcal{A}_i$):
External Actions ($\mathcal{A}_e$):
The decision procedure $D$ operates as a continuous loop with two alternating stages:
This creates the characteristic perceive-think-act cycle that distinguishes agents from single-pass LLM inference.
# Simplified CoALA decision loop class CoALAAgent: def __init__(self, llm, episodic_mem, semantic_mem, procedural_mem): self.llm = llm self.working_memory = [] self.episodic = episodic_mem self.semantic = semantic_mem self.procedural = procedural_mem def step(self, observation): self.working_memory.append(observation) # Internal actions: retrieval + reasoning context = self.semantic.retrieve(observation) past = self.episodic.retrieve(observation) self.working_memory.extend([context, past]) # Decision via LLM reasoning plan = self.llm.reason(self.working_memory) # External action: grounding result = self.execute(plan.action) # Learning: store experience self.episodic.store(observation, plan, result) return result
CoALA explicitly builds on classical cognitive architectures, particularly Soar (Laird, Newell & Rosenbloom, 1987), where production rules in long-term memory match against working memory to fire actions. CoALA replaces rigid production rules with flexible LLM-based reasoning while retaining the modular memory-action-decision structure. The framework also draws on Tulving's (1972) distinction between episodic and semantic memory and Anderson's (1983) ACT-R model of procedural knowledge.
This grounding in cognitive science enables CoALA to serve as a bridge between decades of cognitive architecture research and the emerging field of LLM-based agents, providing a common vocabulary for comparing disparate systems.
The survey of existing agents through the CoALA lens reveals that most current systems underutilize long-term memory — particularly episodic and procedural memory. Many agents rely solely on in-context learning (working memory) without persistent storage, limiting their ability to improve over time.