Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Agent memory architecture is the foundational system that transforms passive language models into persistent, adaptive agents capable of learning and reasoning across extended interactions. Memory enables three critical capabilities: state awareness (knowing what is happening now), persistence (retaining knowledge across sessions), and selection (deciding what is worth remembering). 1) 2)
Short-term memory provides temporary storage for immediate context and task state during active interactions. It functions as a buffer that enables the agent to maintain continuity across multiple reasoning and action steps. 3)
Key functions:
Short-term memory is typically implemented as the LLM's context window or a managed conversation buffer. It is sufficient for most single-task completions. 4)
Long-term memory stores historical data including previously executed actions, outcomes, and environmental observations across sessions. This persistent layer is critical for agents operating over extended periods. 5)
Capabilities:
Long-term memory is typically implemented through external vector stores for fast semantic retrieval of historical information. 6)
Episodic memory stores specific past experiences, interactions, and outcomes as discrete episodes. It operates similarly to case-based reasoning, allowing agents to retrieve and learn from similar past situations. 7)
Semantic memory encompasses factual knowledge and embeddings, often implemented through RAG (Retrieval-Augmented Generation) systems that integrate external vector stores to retrieve facts on-the-fly. This reduces hallucinations and scales knowledge beyond what fits in parametric model weights. 8)
Procedural memory encodes learned skills and tool usage patterns as executable functions and integrations. This includes the agent's capability to execute actions through APIs, code generation, or control of external systems. 9)
Vector stores enable efficient semantic search and retrieval by converting text into high-dimensional embeddings. Agents query these stores to find relevant historical information without scanning entire interaction logs. Common choices include Pinecone, Weaviate, Qdrant, and Chroma. 10)
Key-value stores handle simple state tracking, user preferences, and configuration. They provide fast lookups for structured data that does not require semantic search (e.g., Redis, DynamoDB).
Graph databases capture complex relationships, entity connections, and temporal sequences. They excel at multi-hop reasoning where the agent needs to traverse relationships between concepts. 11)
Production systems often implement tiered architectures: 12)
Effective memory systems require deliberate mechanisms for both consolidation and forgetting: 13)
Memory consolidation moves information between short-term and long-term storage based on usage patterns, recency, and significance. This mimics how humans internalize knowledge, optimizing both recall speed and storage efficiency.
Intelligent forgetting prevents memory bloat through priority scoring and contextual tagging. Advanced systems use dynamic decay mechanisms where low-relevance entries gradually lose priority over time, freeing computational and storage resources.
LangChain provides memory modules for building memory-enabled agents, facilitating integration of memory, APIs, and reasoning workflows. LangGraph extends this with hierarchical memory graphs that track dependencies and enable structured learning over time. 14)
Mem0 provides a dedicated memory layer for AI agents with automatic memory extraction, consolidation, and retrieval. It handles the complexity of deciding what to remember and when to forget. 15)
Letta implements virtual context management, treating LLM context as a form of virtual memory with paging between active context and external storage. This allows agents to work with effectively unlimited memory while maintaining the illusion of a single coherent context. 16)
Well-engineered memory systems support conversation continuity, sequential decision-making, knowledge transfer across sessions, error correction, and reflective reasoning where agents audit and learn from past outcomes.