====== Cognitive Architectures for Language Agents (CoALA) ====== The **Cognitive Architectures for Language Agents (CoALA)** framework, proposed by Sumers et al. (2023), provides a systematic taxonomy for organizing LLM-based language agents into modular components inspired by cognitive science. Drawing on decades of research in cognitive architectures such as Soar and ACT-R, CoALA formalizes the design space of language agents through memory modules, structured action spaces, and decision-making procedures. ===== Overview ===== As language model-based agents proliferate --- from ReAct to Reflexion to Voyager --- the field lacks a unifying framework to compare, categorize, and design them. CoALA addresses this by proposing a modular architecture that retrospectively organizes existing agents and prospectively identifies gaps in the design space. The framework defines an agent as a tuple: $$A = (M_w, M_{lt}, \mathcal{A}_i, \mathcal{A}_e, D)$$ where $M_w$ is working memory, $M_{lt}$ is long-term memory, $\mathcal{A}_i$ is the internal action space, $\mathcal{A}_e$ is the external action space, and $D$ is the decision procedure. ===== Memory Modules ===== CoALA divides agent memory into **working memory** and three types of **long-term memory**, mirroring distinctions from cognitive psychology: * **Working Memory**: A short-term scratchpad holding the agent's current context --- recent observations, intermediate reasoning results, and partial plans. Analogous to the limited-capacity buffer in human cognition. * **Episodic Memory**: Stores past experiences and events (e.g., "What happened when I tried approach X?"). Enables learning from specific interaction histories. * **Semantic Memory**: Holds factual world knowledge (e.g., "Water boils at 100°C at sea level"). Can be stored as text, embeddings, or knowledge graphs. * **Procedural Memory**: Encodes skills and procedures --- often represented as code snippets, tool definitions, or implicitly within LLM parameters. Defines //how// to perform actions. ===== Action Spaces ===== Actions are partitioned into **internal** and **external** categories: === Internal Actions === * **Retrieval**: Reading from long-term memory stores * **Reasoning**: Updating working memory via LLM inference (chain-of-thought, reflection) * **Learning**: Writing new information to long-term memory === External Actions === * **Grounding**: Interacting with the outside world --- tool use, API calls, web browsing, or robotic control # Simplified CoALA agent loop class CoALAAgent: def __init__(self, llm, episodic_mem, semantic_mem, procedural_mem): self.llm = llm self.working_memory = [] self.episodic = episodic_mem self.semantic = semantic_mem self.procedural = procedural_mem def decision_loop(self, observation): self.working_memory.append(observation) while not self.should_act_externally(): # Internal actions: retrieve, reason, learn retrieved = self.retrieve(self.working_memory) reasoning = self.llm.reason(self.working_memory + retrieved) self.working_memory.append(reasoning) action = self.select_external_action(self.working_memory) result = self.execute(action) self.episodic.store(observation, action, result) return result ===== Decision Procedures ===== CoALA formalizes decision-making as a continuous loop with two stages: - **Planning Stage**: The agent iteratively applies reasoning and retrieval to propose, evaluate, and select actions. This may involve multi-step deliberation or simple reactive mappings. - **Execution Stage**: The selected action is performed (grounding or learning), the environment returns new observations, and the cycle repeats. This distinguishes agents on a spectrum from purely **reactive** (single LLM call maps observation to action) to **deliberative** (multi-step internal planning before acting). ===== Connections to Cognitive Science ===== CoALA explicitly builds on classical cognitive architectures: * **Soar**: Production rules in long-term memory match working memory contents to trigger actions. CoALA replaces symbolic productions with LLM-based reasoning. * **ACT-R**: Distinguishes declarative and procedural memory with activation-based retrieval. CoALA's memory taxonomy mirrors this structure. * **Global Workspace Theory**: Working memory serves as a shared workspace where different modules contribute and compete for attention. The framework positions LLM agents within a 50-year lineage of AI research, arguing that cognitive architectures provide the missing organizational structure for the rapidly expanding space of language agents. ===== References ===== * [[https://arxiv.org/abs/2309.02427|Sumers et al. "Cognitive Architectures for Language Agents" (2023)]] * [[https://arxiv.org/abs/2210.03629|Shinn et al. "Reflexion: Language Agents with Verbal Reinforcement Learning" (2023)]] * [[https://arxiv.org/abs/2210.02414|Yao et al. "ReAct: Synergizing Reasoning and Acting in Language Models" (2023)]] ===== See Also ===== * [[tool_learning_foundation_models|Tool Learning with Foundation Models]] * [[self_refine|Self-Refine]] * [[chain_of_verification|Chain-of-Verification]]