Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
CogMem is a cognitively inspired, memory-augmented LLM architecture that supports sustained iterative reasoning through structured, persistent memory. Introduced by Zhang et al. (2025), CogMem addresses the fundamental limitation that LLMs excel at single-turn reasoning but lose accuracy and coherence over extended multi-turn interactions due to reasoning bias, task drift, hallucination, overconfidence, and memory decay.
Real-world applications demand iterative, multi-turn reasoning where the model must maintain coherence across dozens or hundreds of exchanges. Current approaches typically append full conversational histories to the context, causing:
TurnBench evaluations highlight six recurring failure modes: reasoning bias, task drift, hallucination, overconfidence, memory decay, and error propagation.
CogMem implements a hierarchical memory system inspired by Baddeley's model of human working memory:
Layer 1: Long-Term Memory (LTM). Consolidates cross-session reasoning strategies, reusable problem-solving patterns, and distilled insights accumulated across multiple interactions. LTM entries are stored in a vectorized database enabling high-speed semantic retrieval. This layer evolves over time as new information is integrated.
Layer 2: Direct Access Memory (DA). Functions as session-level working memory, maintaining concise notes of intermediate conclusions, sub-goals, and ongoing plans. Rather than storing full conversational histories, DA preserves structured summaries of key information. At session start, DA is populated with relevant memories retrieved from LTM.
Layer 3: Focus of Attention (FoA). Dynamically reconstructs a minimal, task-relevant reasoning context for each turn by selectively integrating current DA notes, retrieved LTM entries, summarized dialogue history, and new user input into a compact prompt.
CogMem's architecture directly parallels the influential cognitive psychology model:
| CogMem Component | Baddeley's Model | Function |
|---|---|---|
| Focus of Attention | Central Executive | Attentional control, selective focus |
| Direct Access Memory | Phonological Loop / Visuospatial Sketchpad | Short-term session storage |
| Long-Term Memory | Declarative Memory | Persistent knowledge and strategies |
This grounding in cognitive science ensures the architecture captures how humans maintain coherent reasoning across extended interactions while managing limited attentional resources.
The FoA operates as a dynamic gating function that prevents unbounded context growth. At each turn:
# Simplified CogMem Focus of Attention reconstruction class FocusOfAttention: def __init__(self, ltm, da_memory): self.ltm = ltm # Long-term vectorized store self.da = da_memory # Session-level working memory def reconstruct_context(self, current_input, reasoning_state): # Step 1: Retrieve relevant session notes da_notes = self.da.retrieve(reasoning_state, top_k=5) # Step 2: Query long-term reasoning patterns ltm_patterns = self.ltm.semantic_search(current_input, top_k=3) # Step 3: Summarize dialogue history history_summary = self.da.get_compressed_history() # Step 4: Synthesize compact context return compose_prompt( ltm_patterns, # Cross-session strategies da_notes, # Current session state history_summary, # Compressed history current_input # New user turn )
CogMem's layered design targets each failure mode through specific mechanisms:
Reasoning Bias and Task Drift. LTM accumulates cross-session reasoning strategies, allowing the model to recognize and correct systematic biases by referencing prior successful patterns. DA's structured notes keep the current task goal explicitly represented.
Hallucination and Overconfidence. DA creates explicit checkpoints where conclusions must be justified. FoA enforces grounding in retrieved memories rather than relying solely on parametric knowledge.
Memory Decay. Prevented through three mechanisms: DA's session-level preservation of intermediate conclusions, LTM's persistent storage of distilled reasoning traces, and FoA's selective retrieval ensuring nothing is lost through context truncation.
Error Propagation. A dual-agent design – reasoning agent and memory agent collaborating continuously – enables error recovery by restoring coherence across turns rather than propagating early mistakes.
The context at turn $t$ is bounded rather than growing linearly:
$$C_t = \text{FoA}(\text{DA}_t, \text{LTM}, h_t, x_t) \quad \text{where} \quad |C_t| \leq K \quad \forall t$$
compared to the standard approach where $|C_t| = \sum_{i=1}^{t} |x_i| + |r_i|$ grows without bound.
Experiments on TurnBench-MS using Gemini 2.5 Flash as the reasoning agent:
The results demonstrate that structured memory simultaneously improves reasoning accuracy while enforcing computational scalability – a rare case where better performance comes with lower cost.
CogMem transforms LLM behavior from reactive, context-dependent reasoning into adaptive, self-consistent systems capable of sustained multi-turn inference. By drawing on decades of cognitive psychology research, it provides a principled architecture for managing the fundamental tension between comprehensive memory and bounded computation.
The three-layer design offers a general template for any system that must reason coherently over extended interactions – from customer support to scientific research assistants to autonomous coding agents.