Browse
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety
Meta
Browse
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety
Meta
Long-term memory in AI agent systems refers to persistent storage mechanisms that allow agents to retain and retrieve information across sessions and over extended time periods. Unlike short-term memory constrained by context window limits, long-term memory leverages external storage systems such as vector databases, knowledge graphs, or structured datastores to maintain an effectively unbounded repository of past experiences, learned facts, and procedural knowledge. This capability is critical for building agents that can accumulate knowledge, personalize their behavior, and avoid repeating past mistakes.
Long-term memory for agents can be categorized following cognitive science distinctions:
Episodic Memory stores records of specific past experiences and interactions, typically as timestamped sequences. An agent's episodic memory might contain conversation logs, task execution traces, or records of past decisions and their outcomes. Systems like Zep specialize in episodic memory with real-time summarization and temporal knowledge graphs. Episodic memory enables agents to answer questions like “What did the user ask about last week?” or “How did I solve this problem before?”
Semantic Memory stores general world knowledge, facts, and concepts abstracted from specific experiences. This maps to explicit/declarative memory and includes structured knowledge bases, entity relationships, and domain facts. Mem0 excels at extracting atomic facts from conversations and storing them as user-scoped semantic memories.
Procedural Memory encodes learned skills and behavioral patterns, corresponding to implicit memory. In agent systems, this manifests as fine-tuned model weights, learned tool-use patterns, and optimized prompt templates.
A complementary approach to long-term memory involves organizing information around persistent contexts or projects rather than purely by memory type. Persistent context groups chats, files, and custom instructions into dedicated containers—often called notebooks or homes—that persist across sessions and prevent AI systems from losing project-specific details1).
This pattern enables users to build on ongoing research, side projects, and collaborative work without requiring repetitive explanations of context. Google's Gemini implements persistent context through notebooks that sync across applications like NotebookLM, allowing seamless continuation of work across different tools and sessions. Unlike traditional retrieval-based memory that must be queried, persistent context is always implicitly available to the agent within its designated namespace, reducing cognitive load on both the agent and user.
import [[chromadb|chromadb]] from datetime import datetime # Initialize a persistent [[chromadb|ChromaDB]] client (data survives restarts) client = [[chromadb|chromadb]].PersistentClient(path="./agent_memory_db") # Create a collection for agent long-term memory collection = client.get_or_create_collection( name="agent_memory", metadata={"hnsw:space": "cosine"}, # use cosine similarity ) # Store memories with metadata for filtering memories = [ ("User prefers concise answers with code examples.", {"type": "preference", "user": "alice"}), ("Deployed the API to production on 2025-03-15.", {"type": "event", "user": "alice"}), ("The database schema uses UUID primary keys.", {"type": "fact", "user": "alice"}), ("User asked about HNSW indexing for vector search.", {"type": "topic", "user": "alice"}), ] for i, (text, [[meta|meta]]) in enumerate(memories): [[meta|meta]]["timestamp"] = datetime.now().isoformat() collection.add( documents=[text], metadatas=[[[meta|meta]]], ids=[f"mem_{i}"], ) # Retrieve relevant memories by semantic similarity results = collection.query( query_texts=["What does the user like in responses?"], n_results=2, ) print("Retrieved:", results["documents"]) # Filter by metadata (e.g., only preferences) filtered = collection.query( query_texts=["user settings"], n_results=3, where={"type": "preference"}, ) print("Preferences:", filtered["documents"]) # Count total stored memories print(f"Total memories: {collection.count()}")
Vector databases are the primary infrastructure for agent long-term memory, storing embedding representations for semantic similarity search:
Pinecone is a fully managed, serverless vector database optimized for production-scale agent systems. It supports hybrid search (dense + sparse vectors), metadata filtering, and namespaces for multi-tenant isolation. Widely used in enterprise RAG pipelines.
Weaviate is an open-source vector database with built-in vectorization modules, hybrid search combining BM25 and vector similarity, and a GraphQL API. It supports generative search where retrieved results are fed directly to LLMs for answer synthesis.
Chroma is a lightweight, local-first embedding database popular for prototyping and development. Its simplicity and Python-native API make it the default choice for LangChain and LlamaIndex tutorials, though it scales to production with client-server mode.
Milvus is a distributed vector database designed for billion-scale datasets. It supports multiple index types (FAISS, HNSW, ScaNN), GPU acceleration, and dynamic schema. Used by major tech companies for production search and recommendation.
All of these support the core approximate nearest neighbor search operations and MIPS that underpin retrieval-augmented generation.
Effective long-term memory requires not just storage but active management:
Memory Consolidation involves promoting important short-term observations to long-term storage. MemGPT/Letta agents self-edit their memory, deciding what to archive from the conversation context. Google's Memory Bank (2025) automates extraction of key facts from agent interactions, storing them persistently via REST API. KAIROS, a background daemon within the Claude Code architecture also known as 'Dream Mode,' represents an alternative consolidation approach inspired by human sleep patterns. KAIROS performs reflective passes over memory files after 24 hours of inactivity to synthesize learnings into durable, well-organized data, maintaining a lightweight index for future agent sessions2).
Forgetting and Pruning prevents unbounded memory growth and removes stale information. Strategies include time-decay functions that deprioritize older memories, LRU (least recently used) eviction, attention-weighted persistence scoring, and Ebbinghaus forgetting curves. SAGE (Liang et al., 2024)3) pioneered reflection-based forgetting using Ebbinghaus curves to balance retention and pruning.
Contradiction Resolution addresses conflicting information accumulated over time. Zep's temporal knowledge graphs track when facts were established and updated, enabling the agent to prefer newer information when contradictions arise.
Several frameworks provide integrated long-term memory for agents:
MemGPT (Packer et al., 20234) treats the LLM as a processor with an OS-like memory hierarchy: core memory (always in context, like RAM), recall memory (searchable conversation history, like disk cache), and archival memory (long-term vector storage). Agents use tool calls to page memories in and out, self-editing their persistent state.
Letta (evolved from MemGPT) extends this with cloud sync for cross-session agent persistence, a production-ready server, and multi-agent memory sharing. It represents the most complete implementation of hierarchical agent memory.
Zep provides low-latency (<200ms) memory with temporal knowledge graphs via the Graphiti engine. It scores 63.8% on LongMemEval and supports SOC2/HIPAA compliance. The open-source Graphiti library has ~24K GitHub stars.
Mem0 offers a lightweight memory layer that extracts atomic facts from conversations, scoped by user, session, or agent. The Pro tier adds knowledge graph support via Neo4j. It integrates with any LLM application and uses Qdrant or Chroma as vector backends.
Google Memory Bank (2025) provides framework-agnostic persistent memory for agents via the Agent Development Kit (ADK), automatically extracting and recalling information without manual memory management code.