Memory Management for LLM Agents

Introduction

Effective memory management is essential for Large Language Model (LLM) agents to maintain context, recall past interactions, and enhance performance over time. As of 2025, the field has evolved from simple conversation buffers to sophisticated multi-tier memory systems inspired by human cognitive architecture. This article examines the memory landscape for LLM agents, covering memory types, dedicated memory frameworks, and general-purpose agent libraries with memory capabilities.

Memory Types for Agents

Agent memory systems draw from cognitive science, organizing information into complementary types:

Sensory Memory is the initial processing of raw multimodal input (vision, text, audio) through encoder modules like Vision Transformers, CLIP, and Whisper. It acts as a high-bandwidth buffer where attention mechanisms filter what gets promoted to working memory.

Short-Term/Working Memory corresponds to the LLM's context window (128K-1M+ tokens in 2025). It holds the current conversation, retrieved facts, and reasoning traces. KV caches, chain-of-thought scratchpads, and in-context learning all operate within this tier.

Long-Term Memory uses external storage (vector databases, knowledge graphs, structured stores) to persist information across sessions. This tier has effectively unlimited capacity but requires retrieval mechanisms to access.

Explicit/Declarative Memory stores facts, events, and concepts that can be directly queried: user preferences, domain knowledge, interaction history. Implemented via vector stores and knowledge graphs.

Implicit/Procedural Memory encodes learned skills and behaviors in model weights through pretraining and fine-tuning. This includes tool-use patterns, reasoning procedures, and response formatting habits.

These types are organized in hierarchical architectures where information flows between tiers through consolidation, eviction, and retrieval operations. See memory augmentation strategies for techniques that enhance these systems.

Python Example: Simple Memory Store with Embedding Retrieval

import numpy as np
from [[openai|openai]] import [[openai|OpenAI]]
from dataclasses import dataclass, field
from datetime import datetime
 
client = [[openai|OpenAI]]()
 
@dataclass
class MemoryEntry:
    text: str
    embedding: np.ndarray
    timestamp: datetime = field(default_factory=datetime.now)
    metadata: dict = field(default_factory=dict)
 
class AgentMemory:
    def __init__(self, model: str = "text-embedding-3-small"):
        self.model = model
        self.entries: list[MemoryEntry] = []
 
    def _embed(self, text: str) -> np.ndarray:
        resp = client.[[embeddings|embeddings]].create(input=text, model=self.model)
        return np.array(resp.data[0].embedding, dtype="float32")
 
    def store(self, text: str, metadata: dict = None):
        embedding = self._embed(text)
        self.entries.append(MemoryEntry(text=text, embedding=embedding, metadata=metadata or {}))
 
    def retrieve(self, query: str, top_k: int = 3) -> liststr:
        query_emb = self._embed(query)
        scores = []
        for entry in self.entries:
            sim = np.dot(query_emb, entry.embedding) / (
                np.linalg.norm(query_emb) * np.linalg.norm(entry.embedding)
            )
            scores.append((sim, entry))
        scores.sort(key=lambda x: x[0], reverse=True)
        return [entry.text for _, entry in scores[:top_k]]
 
memory = AgentMemory()
memory.store("User prefers Python over JavaScript for backend work.")
memory.store("Last project used FastAPI with PostgreSQL.")
memory.store("User is interested in vector databases and HNSW.")
 
relevant = memory.retrieve("What tech stack does the user like?")
print("Retrieved memories:", relevant)

Memory Consolidation and Dreaming

Beyond passive storage, advanced agent memory systems employ active consolidation processes inspired by biological sleep and dreaming to optimize memory organization and defragmentation.

Agent Dreaming is a memory management feature that utilizes cyclical phases to consolidate and reorganize agent memory. Implemented in systems such as OpenClaw 2026.4.5¹⁾, the dreaming process operates in multiple phases:

Light Phase: Initial memory sorting and compression of recent interactions
Deep Phase: Extraction and consolidation of core facts and relationships, compression of redundant information
REM Phase: Recombination of memory fragments and emergence of new patterns and associations

A key feature of agent dreaming systems is the generation of a Dream Diary, a human-readable record that documents the agent's internal memory consolidation process. This allows users to understand and audit the agent's evolving internal state, memory priorities, and knowledge structure without directly inspecting embeddings or raw memory stores.

This approach mirrors human memory consolidation during sleep and provides transparency into how agents organize long-term knowledge while reducing memory fragmentation and improving retrieval efficiency over extended operational periods.

Dedicated Memory Frameworks (2025)

A new category of tools has emerged focused specifically on providing persistent memory for agents:

Letta (MemGPT)

Website: letta.com/]]
GitHub: letta-ai/letta]]
Architecture: Full agent platform with OS-inspired tiered memory
Features:
- Core memory: Key-value blocks always in LLM context (like RAM), self-edited by the agent
- Recall memory: Searchable conversation history (like disk cache)
- Archival memory: Long-term vector storage for persistent knowledge
- Agents manage their own memory via tool calls (read, write, search, archive)
- Cloud sync for cross-session persistence and multi-agent memory sharing
Origin: Based on MemGPT, which introduced the concept of LLMs as operating systems managing their own virtual memory²⁾

Zep

Website: https://www.getzep.com/
GitHub: https://github.com/getzep/graphiti (~24K stars)
Architecture: Temporal knowledge graphs via the Graphiti engine
Features:
- Low-latency memory retrieval (<200ms)
- Entity and relationship modeling with fact evolution tracking
- Scores 63.8% on LongMemEval benchmark (with GPT-4o)
- SOC2 and HIPAA compliance for enterprise deployments
- Combines memory graphs, RAG, and data connectors
- Strongest for temporal reasoning and contradiction resolution

Mem0

Website: mem0.ai/]]
GitHub: https://github.com/mem0ai/mem0
Architecture: Dual-store (vector database + knowledge graph in Pro tier)
Features:
- Extracts atomic facts from conversations automatically
- Memory scoped by user, session, or agent
- Supports backends: Qdrant, Chroma, and others
- Pro tier adds Neo4j knowledge graph for entity tracking
- Lightweight, plug-and-play integration with any LLM application
- Reports up to 90% token cost reduction vs. full-context replay

LangMem (LangChain)

GitHub: langchain-ai/langmem]] (~1.3K stars)
Architecture: Flat key-value + vector stores, MIT-licensed
Features:
- Deep integration with LangGraph for long-term memory in workflows
- Background extraction and consolidation of memories
- Prompt optimization based on stored preferences
- Zero-infrastructure simplicity for LangGraph users

Google Memory Bank

Architecture: Framework-agnostic persistent memory via Agent Development Kit (ADK)
Features:
- Automatic extraction and storage of key information during agent interactions
- REST API for retrieval and updates
- Works with LangGraph, LlamaIndex, and other frameworks
- No manual memory management code required

Cognee

GitHub: https://github.com/cognee-ai/cognee
Features:
- Open-source framework for knowledge and memory management
- Graph-enhanced storage with automatic dataset generation
- Uses dlt as data loader and DuckDB as metastore

General-Purpose Agent Frameworks

These frameworks include memory as part of broader agent capabilities:

LangChain

Website: langchain.com/]]
GitHub: langchain-ai/langchain]]
Features:
- Supports short-term and long-term memory via modular components
- Integrates with 21+ memory providers (Cassandra, Elasticsearch, MongoDB, Postgres, Redis)
- Buffer, summary, and entity memory types for conversation management
- LangGraph adds stateful workflows with persistent checkpointing

LlamaIndex

Website: llamaindex.ai/]]
GitHub: https://github.com/run-llama/llama_index (~48K stars)
Features:
- Composable memory buffers for RAG-heavy agents
- Supports 160+ data sources with advanced indexing
- LlamaCloud integration for managed retrieval
- Customizable RAG workflows with query planning

Microsoft Semantic Kernel

GitHub: microsoft/semantic-kernel]]
Features:
- Enterprise-grade memory management with plugin architecture
- Supports multiple vector store backends
- Native integration with Azure AI services

CrewAI

GitHub: https://github.com/joaomdmoura/crewAI
Features:
- Shared memory for multi-agent collaboration
- Role-based memory scoping for agent teams

AutoGPT

GitHub: https://github.com/Significant-Gravitas/Auto-GPT
Features:
- Persistent memory for autonomous agent loops
- File-based and vector-based memory backends

Key Research

MemGPT (Packer et al., 2023): Introduced OS-like memory management for LLM agents with virtual memory paging
“Memory in the Age of AI Agents” (arXiv:2512.13564): Comprehensive taxonomy of agent memory (parametric, latent, token-level) across formation, consolidation, and evolution³⁾
A-Mem (Xu et al., 2025): Agentic memory with Zettelkasten-inspired dynamic organization⁴⁾
H-MEM (Sun et al., 2025): Four-level semantic hierarchy with sublinear scaling⁵⁾
G-Memory (Zhang et al., 2025): Three-layer graph memory for multi-agent collaboration⁶⁾
CDMem (Gao et al., 2025): Context-dependent hierarchical encoding achieving 85.8% on ALFWorld
MARK (Ganguli et al., 2025): Refined memory with hallucination suppression for multi-agent systems⁷⁾
SAGE (Liang et al., 2024): Ebbinghaus forgetting curves for agent memory management⁸⁾

Memory Infrastructure

The retrieval layer underlying agent memory relies on efficient similarity search:

FAISS - Meta's vector similarity search library, supporting IVF, HNSW, and PQ indexes at billion scale
HNSW Graphs - The dominant ANN algorithm in modern vector databases
ScaNN - Google's library with anisotropic vector quantization
Approximate Nearest Neighbors - The broader ANN algorithm landscape
MIPS - The retrieval operation underlying embedding-based search
LSH - Hash-based approximate search with provable guarantees

References

¹⁾

Thursdai - Agent Dreaming in OpenClaw 2026.4.5

²⁾

Packer et al., 2023, MemGPT: Towards LLMs as Operating Systems

³⁾

"Memory in the Age of AI Agents." arXiv:2512.13564, 2025.

⁴⁾

Xu, W. et al. "A-MEM: Agentic Memory for LLM Agents." arXiv:2502.12110, 2025.

⁵⁾

Sun, H. et al. "H-MEM: Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents." arXiv:2507.22925, 2025.

⁶⁾

Zhang, G. et al. "G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems." arXiv:2506.07398, 2025.

⁷⁾

Ganguli, A. et al. "MARK: Memory Augmented Refinement of Knowledge." arXiv:2505.05177, 2025.

⁸⁾

Liang et al. "SAGE." arXiv:2409.00872, 2024.

AI Agent Knowledge Base

Sidebar

Table of Contents

Memory Management for LLM Agents

Introduction

Memory Types for Agents

Python Example: Simple Memory Store with Embedding Retrieval

Memory Consolidation and Dreaming

Dedicated Memory Frameworks (2025)

Letta (MemGPT)

Zep

Mem0

LangMem (LangChain)

Google Memory Bank

Cognee

General-Purpose Agent Frameworks

LangChain

LlamaIndex

Microsoft Semantic Kernel

CrewAI

AutoGPT

Key Research

Memory Infrastructure

See Also

References

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Memory Management for LLM Agents

Introduction

Memory Types for Agents

Python Example: Simple Memory Store with Embedding Retrieval

Memory Consolidation and Dreaming

Dedicated Memory Frameworks (2025)

Letta (MemGPT)

Zep

Mem0

LangMem (LangChain)

Google Memory Bank

Cognee

General-Purpose Agent Frameworks

LangChain

LlamaIndex

Microsoft Semantic Kernel

CrewAI

AutoGPT

Key Research

Memory Infrastructure

See Also

References

Page Tools