====== Memory Management for LLM Agents ====== ===== Introduction ===== Effective memory management is essential for Large Language Model (LLM) agents to maintain context, recall past interactions, and enhance performance over time. As of 2025, the field has evolved from simple conversation buffers to sophisticated multi-tier memory systems inspired by human cognitive architecture. This article examines the memory landscape for LLM agents, covering memory types, dedicated memory frameworks, and general-purpose agent libraries with memory capabilities. graph TD Input[Environment Input] --> SM[Sensory Memory] SM -->|Attention| STM[Short-Term Memory] STM -->|Consolidation| LTM[Long-Term Memory] LTM -->|Retrieval| STM STM --> Agent[Agent Core / LLM] Agent -->|Action| Output[Response / Action] style Agent fill:#69f,stroke:#333 style LTM fill:#f96,stroke:#333 ===== Memory Types for Agents ===== Agent memory systems draw from cognitive science, organizing information into complementary types: **[[sensory_memory|Sensory Memory]]** is the initial processing of raw multimodal input (vision, text, audio) through encoder modules like Vision Transformers, CLIP, and Whisper. It acts as a high-bandwidth buffer where attention mechanisms filter what gets promoted to working memory. **[[short_term_memory|Short-Term/Working Memory]]** corresponds to the LLM's context window (128K-1M+ tokens in 2025). It holds the current conversation, retrieved facts, and reasoning traces. KV caches, chain-of-thought scratchpads, and in-context learning all operate within this tier. **[[long_term_memory|Long-Term Memory]]** uses external storage (vector databases, knowledge graphs, structured stores) to persist information across sessions. This tier has effectively unlimited capacity but requires retrieval mechanisms to access. **[[explicit_memory|Explicit/Declarative Memory]]** stores facts, events, and concepts that can be directly queried: user preferences, domain knowledge, interaction history. Implemented via vector stores and knowledge graphs. **[[implicit_memory|Implicit/Procedural Memory]]** encodes learned skills and behaviors in model weights through pretraining and fine-tuning. This includes tool-use patterns, reasoning procedures, and response formatting habits. These types are organized in [[hierarchical_memory|hierarchical architectures]] where information flows between tiers through consolidation, eviction, and retrieval operations. See [[memory_augmentation_strategies|memory augmentation strategies]] for techniques that enhance these systems. ===== Python Example: Simple Memory Store with Embedding Retrieval ===== import numpy as np from openai import OpenAI from dataclasses import dataclass, field from datetime import datetime client = OpenAI() @dataclass class MemoryEntry: text: str embedding: np.ndarray timestamp: datetime = field(default_factory=datetime.now) metadata: dict = field(default_factory=dict) class AgentMemory: def __init__(self, model: str = "text-embedding-3-small"): self.model = model self.entries: list[MemoryEntry] = [] def _embed(self, text: str) -> np.ndarray: resp = client.embeddings.create(input=text, model=self.model) return np.array(resp.data[0].embedding, dtype="float32") def store(self, text: str, metadata: dict = None): embedding = self._embed(text) self.entries.append(MemoryEntry(text=text, embedding=embedding, metadata=metadata or {})) def retrieve(self, query: str, top_k: int = 3) -> list[str]: query_emb = self._embed(query) scores = [] for entry in self.entries: sim = np.dot(query_emb, entry.embedding) / ( np.linalg.norm(query_emb) * np.linalg.norm(entry.embedding) ) scores.append((sim, entry)) scores.sort(key=lambda x: x[0], reverse=True) return [entry.text for _, entry in scores[:top_k]] memory = AgentMemory() memory.store("User prefers Python over JavaScript for backend work.") memory.store("Last project used FastAPI with PostgreSQL.") memory.store("User is interested in vector databases and HNSW.") relevant = memory.retrieve("What tech stack does the user like?") print("Retrieved memories:", relevant) ===== Dedicated Memory Frameworks (2025) ===== A new category of tools has emerged focused specifically on providing persistent memory for agents: ==== Letta (MemGPT) ==== * **Website**: [[https://www.letta.com/]] * **GitHub**: [[https://github.com/letta-ai/letta]] * **Architecture**: Full agent platform with OS-inspired tiered memory * **Features**: * **Core memory**: Key-value blocks always in LLM context (like RAM), self-edited by the agent * **Recall memory**: Searchable conversation history (like disk cache) * **Archival memory**: Long-term vector storage for persistent knowledge * Agents manage their own memory via tool calls (read, write, search, archive) * Cloud sync for cross-session persistence and multi-agent memory sharing * **Origin**: Based on MemGPT, which introduced the concept of LLMs as operating systems managing their own virtual memory(([[https://arxiv.org/abs/2310.08560|Packer et al., 2023 -- MemGPT: Towards LLMs as Operating Systems]])) ==== Zep ==== * **Website**: [[https://www.getzep.com/]] * **GitHub**: [[https://github.com/getzep/graphiti]] (~24K stars) * **Architecture**: Temporal knowledge graphs via the Graphiti engine * **Features**: * Low-latency memory retrieval (<200ms) * Entity and relationship modeling with fact evolution tracking * Scores 63.8% on LongMemEval benchmark (with GPT-4o) * SOC2 and HIPAA compliance for enterprise deployments * Combines memory graphs, RAG, and data connectors * Strongest for temporal reasoning and contradiction resolution ==== Mem0 ==== * **Website**: [[https://mem0.ai/]] * **GitHub**: [[https://github.com/mem0ai/mem0]] * **Architecture**: Dual-store (vector database + knowledge graph in Pro tier) * **Features**: * Extracts atomic facts from conversations automatically * Memory scoped by user, session, or agent * Supports backends: Qdrant, Chroma, and others * Pro tier adds Neo4j knowledge graph for entity tracking * Lightweight, plug-and-play integration with any LLM application * Reports up to 90% token cost reduction vs. full-context replay ==== LangMem (LangChain) ==== * **GitHub**: [[https://github.com/langchain-ai/langmem]] (~1.3K stars) * **Architecture**: Flat key-value + vector stores, MIT-licensed * **Features**: * Deep integration with LangGraph for long-term memory in workflows * Background extraction and consolidation of memories * Prompt optimization based on stored preferences * Zero-infrastructure simplicity for LangGraph users ==== Google Memory Bank ==== * **Architecture**: Framework-agnostic persistent memory via Agent Development Kit (ADK) * **Features**: * Automatic extraction and storage of key information during agent interactions * REST API for retrieval and updates * Works with LangGraph, LlamaIndex, and other frameworks * No manual memory management code required ==== Cognee ==== * **GitHub**: [[https://github.com/cognee-ai/cognee]] * **Features**: * Open-source framework for knowledge and memory management * Graph-enhanced storage with automatic dataset generation * Uses dlt as data loader and DuckDB as metastore ===== General-Purpose Agent Frameworks ===== These frameworks include memory as part of broader agent capabilities: ==== LangChain ==== * **Website**: [[https://www.langchain.com/]] * **GitHub**: [[https://github.com/langchain-ai/langchain]] * **Features**: * Supports short-term and long-term memory via modular components * Integrates with 21+ memory providers (Cassandra, Elasticsearch, MongoDB, Postgres, Redis) * Buffer, summary, and entity memory types for conversation management * LangGraph adds stateful workflows with persistent checkpointing ==== LlamaIndex ==== * **Website**: [[https://www.llamaindex.ai/]] * **GitHub**: [[https://github.com/run-llama/llama_index]] (~48K stars) * **Features**: * Composable memory buffers for RAG-heavy agents * Supports 160+ data sources with advanced indexing * LlamaCloud integration for managed retrieval * Customizable RAG workflows with query planning ==== Microsoft Semantic Kernel ==== * **GitHub**: [[https://github.com/microsoft/semantic-kernel]] * **Features**: * Enterprise-grade memory management with plugin architecture * Supports multiple vector store backends * Native integration with Azure AI services ==== CrewAI ==== * **GitHub**: [[https://github.com/joaomdmoura/crewAI]] * **Features**: * Shared memory for multi-agent collaboration * Role-based memory scoping for agent teams ==== AutoGPT ==== * **GitHub**: [[https://github.com/Significant-Gravitas/Auto-GPT]] * **Features**: * Persistent memory for autonomous agent loops * File-based and vector-based memory backends ===== Key Research ===== * **MemGPT** ([[https://arxiv.org/abs/2310.08560|Packer et al., 2023]]): Introduced OS-like memory management for LLM agents with virtual memory paging * **"Memory in the Age of AI Agents"** ([[https://arxiv.org/abs/2512.13564|arXiv:2512.13564]]): Comprehensive taxonomy of agent memory (parametric, latent, token-level) across formation, consolidation, and evolution(([[https://arxiv.org/abs/2512.13564|"Memory in the Age of AI Agents." arXiv:2512.13564, 2025.]])) * **A-Mem** ([[https://arxiv.org/abs/2502.12110|Xu et al., 2025]]): Agentic memory with Zettelkasten-inspired dynamic organization(([[https://arxiv.org/abs/2502.12110|Xu, W. et al. "A-MEM: Agentic Memory for LLM Agents." arXiv:2502.12110, 2025.]])) * **H-MEM** ([[https://arxiv.org/abs/2507.22925|Sun et al., 2025]]): Four-level semantic hierarchy with sublinear scaling(([[https://arxiv.org/abs/2507.22925|Sun, H. et al. "H-MEM: Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents." arXiv:2507.22925, 2025.]])) * **G-Memory** ([[https://arxiv.org/abs/2506.07398|Zhang et al., 2025]]): Three-layer graph memory for multi-agent collaboration(([[https://arxiv.org/abs/2506.07398|Zhang, G. et al. "G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems." arXiv:2506.07398, 2025.]])) * **CDMem** (Gao et al., 2025): Context-dependent hierarchical encoding achieving 85.8% on ALFWorld * **MARK** ([[https://arxiv.org/abs/2505.05177|Ganguli et al., 2025]]): Refined memory with hallucination suppression for multi-agent systems(([[https://arxiv.org/abs/2505.05177|Ganguli, A. et al. "MARK: Memory Augmented Refinement of Knowledge." arXiv:2505.05177, 2025.]])) * **SAGE** ([[https://arxiv.org/abs/2409.00872|Liang et al., 2024]]): Ebbinghaus forgetting curves for agent memory management(([[https://arxiv.org/abs/2409.00872|Liang et al. "SAGE." arXiv:2409.00872, 2024.]])) ===== Memory Infrastructure ===== The retrieval layer underlying agent memory relies on efficient similarity search: * **[[faiss|FAISS]]** - Meta's vector similarity search library, supporting IVF, HNSW, and PQ indexes at billion scale * **[[hnsw_graphs|HNSW Graphs]]** - The dominant ANN algorithm in modern vector databases * **[[scann|ScaNN]]** - Google's library with anisotropic vector quantization * **[[approximate_nearest_neighbors|Approximate Nearest Neighbors]]** - The broader ANN algorithm landscape * **[[maximum_inner_product_search|MIPS]]** - The retrieval operation underlying embedding-based search * **[[locality_sensitive_hashing|LSH]]** - Hash-based approximate search with provable guarantees ===== See Also ===== * [[hierarchical_memory|Hierarchical Memory and Context Management]] * [[memory_augmentation_strategies|Memory Augmentation Strategies]] * [[sensory_memory|Sensory Memory]] * [[short_term_memory|Short-Term Memory]] * [[long_term_memory|Long-Term Memory]] * [[explicit_memory|Explicit Memory]] * [[implicit_memory|Implicit Memory]] ===== References =====