The Multi-Turn Reasoning Problem
Three-Layer Memory Architecture
Mapping to Baddeley's Working Memory
Focus of Attention Mechanism
Addressing Reasoning Failure Modes
Mathematical Formulation
TurnBench Evaluation Results
Significance
References
See Also

Cognitive Memory Architectures

CogMem is a cognitively inspired, memory-augmented LLM architecture that supports sustained iterative reasoning through structured, persistent memory. Introduced by Zhang et al. (2025), CogMem addresses the fundamental limitation that LLMs excel at single-turn reasoning but lose accuracy and coherence over extended multi-turn interactions due to reasoning bias, task drift, hallucination, overconfidence, and memory decay.

The Multi-Turn Reasoning Problem

Real-world applications demand iterative, multi-turn reasoning where the model must maintain coherence across dozens or hundreds of exchanges. Current approaches typically append full conversational histories to the context, causing:

Unbounded context growth – Token count increases linearly with conversation length
Higher computational costs – Attention scales quadratically with context length
Degraded reasoning efficiency – Relevant information is buried among irrelevant history
Cascading failures – Early errors propagate and amplify through subsequent turns

TurnBench evaluations highlight six recurring failure modes: reasoning bias, task drift, hallucination, overconfidence, memory decay, and error propagation.

Three-Layer Memory Architecture

CogMem implements a hierarchical memory system inspired by Baddeley's model of human working memory:

Layer 1: Long-Term Memory (LTM). Consolidates cross-session reasoning strategies, reusable problem-solving patterns, and distilled insights accumulated across multiple interactions. LTM entries are stored in a vectorized database enabling high-speed semantic retrieval. This layer evolves over time as new information is integrated.

Layer 2: Direct Access Memory (DA). Functions as session-level working memory, maintaining concise notes of intermediate conclusions, sub-goals, and ongoing plans. Rather than storing full conversational histories, DA preserves structured summaries of key information. At session start, DA is populated with relevant memories retrieved from LTM.

Layer 3: Focus of Attention (FoA). Dynamically reconstructs a minimal, task-relevant reasoning context for each turn by selectively integrating current DA notes, retrieved LTM entries, summarized dialogue history, and new user input into a compact prompt.

Mapping to Baddeley's Working Memory

CogMem's architecture directly parallels the influential cognitive psychology model:

CogMem Component	Baddeley's Model	Function
Focus of Attention	Central Executive	Attentional control, selective focus
Direct Access Memory	Phonological Loop / Visuospatial Sketchpad	Short-term session storage
Long-Term Memory	Declarative Memory	Persistent knowledge and strategies

This grounding in cognitive science ensures the architecture captures how humans maintain coherent reasoning across extended interactions while managing limited attentional resources.

Focus of Attention Mechanism

The FoA operates as a dynamic gating function that prevents unbounded context growth. At each turn:

Retrieves task-relevant notes from DA based on current reasoning state
Queries LTM semantically for applicable prior reasoning patterns
Incorporates a summarized (not full) dialogue history
Integrates the new user input
Synthesizes these elements into a compact, interpretable prompt

# Simplified CogMem Focus of Attention reconstruction
class FocusOfAttention:
    def __init__(self, ltm, da_memory):
        self.ltm = ltm        # Long-term vectorized store
        self.da = da_memory   # Session-level working memory
 
    def reconstruct_context(self, current_input, reasoning_state):
        # Step 1: Retrieve relevant session notes
        da_notes = self.da.retrieve(reasoning_state, top_k=5)
 
        # Step 2: Query long-term reasoning patterns
        ltm_patterns = self.ltm.semantic_search(current_input, top_k=3)
 
        # Step 3: Summarize dialogue history
        history_summary = self.da.get_compressed_history()
 
        # Step 4: Synthesize compact context
        return compose_prompt(
            ltm_patterns,     # Cross-session strategies
            da_notes,         # Current session state
            history_summary,  # Compressed history
            current_input     # New user turn
        )

Addressing Reasoning Failure Modes

CogMem's layered design targets each failure mode through specific mechanisms:

Reasoning Bias and Task Drift. LTM accumulates cross-session reasoning strategies, allowing the model to recognize and correct systematic biases by referencing prior successful patterns. DA's structured notes keep the current task goal explicitly represented.

Hallucination and Overconfidence. DA creates explicit checkpoints where conclusions must be justified. FoA enforces grounding in retrieved memories rather than relying solely on parametric knowledge.

Memory Decay. Prevented through three mechanisms: DA's session-level preservation of intermediate conclusions, LTM's persistent storage of distilled reasoning traces, and FoA's selective retrieval ensuring nothing is lost through context truncation.

Error Propagation. A dual-agent design – reasoning agent and memory agent collaborating continuously – enables error recovery by restoring coherence across turns rather than propagating early mistakes.

Mathematical Formulation

The context at turn $t$ is bounded rather than growing linearly:

$$C_t = \text{FoA}(\text{DA}_t, \text{LTM}, h_t, x_t) \quad \text{where} \quad |C_t| \leq K \quad \forall t$$

compared to the standard approach where $|C_t| = \sum_{i=1}^{t} |x_i| + |r_i|$ grows without bound.

TurnBench Evaluation Results

Experiments on TurnBench-MS using Gemini 2.5 Flash as the reasoning agent:

Baseline (no memory): Limited accuracy, establishing lower bound
With DA Memory: Total accuracy reaches 0.84 in classic mode
With DA + LTM: Total accuracy reaches 0.93 in classic mode, with perfect scores on easy and medium difficulty and 0.80 on hard tasks
Token efficiency: Less than half the tokens of baseline after 15 turns, with the efficiency gap widening as dialogues extend

The results demonstrate that structured memory simultaneously improves reasoning accuracy while enforcing computational scalability – a rare case where better performance comes with lower cost.

Significance

CogMem transforms LLM behavior from reactive, context-dependent reasoning into adaptive, self-consistent systems capable of sustained multi-turn inference. By drawing on decades of cognitive psychology research, it provides a principled architecture for managing the fundamental tension between comprehensive memory and bounded computation.

The three-layer design offers a general template for any system that must reason coherently over extended interactions – from customer support to scientific research assistants to autonomous coding agents.

Table of Contents