Long-Term Memory

Long-term memory in AI agent systems refers to persistent storage mechanisms that allow agents to retain and retrieve information across sessions and over extended time periods. Unlike short-term memory constrained by context window limits, long-term memory leverages external storage systems such as vector databases, knowledge graphs, or structured datastores to maintain an effectively unbounded repository of past experiences, learned facts, and procedural knowledge. This capability is critical for building agents that can accumulate knowledge, personalize their behavior, and avoid repeating past mistakes.

graph TD Agent[Agent] --> Emb[Embedding Model] Emb --> VS[Vector Store] VS --> Ret[Retrieval] Ret --> Ctx[Context Window] Agent --> KG[Knowledge Graph] KG --> Ret Ctx --> Agent style Agent fill:#69f,stroke:#333 style VS fill:#f96,stroke:#333 style KG fill:#f96,stroke:#333

Types of Long-Term Memory Systems

Long-term memory for agents can be categorized following cognitive science distinctions:

Episodic Memory stores records of specific past experiences and interactions, typically as timestamped sequences. An agent's episodic memory might contain conversation logs, task execution traces, or records of past decisions and their outcomes. Systems like Zep specialize in episodic memory with real-time summarization and temporal knowledge graphs. Episodic memory enables agents to answer questions like “What did the user ask about last week?” or “How did I solve this problem before?”

Semantic Memory stores general world knowledge, facts, and concepts abstracted from specific experiences. This maps to explicit/declarative memory and includes structured knowledge bases, entity relationships, and domain facts. Mem0 excels at extracting atomic facts from conversations and storing them as user-scoped semantic memories.

Procedural Memory encodes learned skills and behavioral patterns, corresponding to implicit memory. In agent systems, this manifests as fine-tuned model weights, learned tool-use patterns, and optimized prompt templates.

Persistent Context and Project-Scoped Memory

A complementary approach to long-term memory involves organizing information around persistent contexts or projects rather than purely by memory type. Persistent context groups chats, files, and custom instructions into dedicated containers—often called notebooks or homes—that persist across sessions and prevent AI systems from losing project-specific details¹⁾.

This pattern enables users to build on ongoing research, side projects, and collaborative work without requiring repetitive explanations of context. Google's Gemini implements persistent context through notebooks that sync across applications like NotebookLM, allowing seamless continuation of work across different tools and sessions. Unlike traditional retrieval-based memory that must be queried, persistent context is always implicitly available to the agent within its designated namespace, reducing cognitive load on both the agent and user.

Python Example: ChromaDB Vector Store for Agent Memory

import [[chromadb|chromadb]]
from datetime import datetime
 
# Initialize a persistent [[chromadb|ChromaDB]] client (data survives restarts)
client = [[chromadb|chromadb]].PersistentClient(path="./agent_memory_db")
 
# Create a collection for agent long-term memory
collection = client.get_or_create_collection(
    name="agent_memory",
    metadata={"hnsw:space": "cosine"},  # use cosine similarity
)
 
# Store memories with metadata for filtering
memories = [
    ("User prefers concise answers with code examples.", {"type": "preference", "user": "alice"}),
    ("Deployed the API to production on 2025-03-15.", {"type": "event", "user": "alice"}),
    ("The database schema uses UUID primary keys.", {"type": "fact", "user": "alice"}),
    ("User asked about HNSW indexing for vector search.", {"type": "topic", "user": "alice"}),
]
 
for i, (text, [[meta|meta]]) in enumerate(memories):
    [[meta|meta]]["timestamp"] = datetime.now().isoformat()
    collection.add(
        documents=[text],
        metadatas=[[[meta|meta]]],
        ids=[f"mem_{i}"],
    )
 
# Retrieve relevant memories by semantic similarity
results = collection.query(
    query_texts=["What does the user like in responses?"],
    n_results=2,
)
print("Retrieved:", results["documents"])
 
# Filter by metadata (e.g., only preferences)
filtered = collection.query(
    query_texts=["user settings"],
    n_results=3,
    where={"type": "preference"},
)
print("Preferences:", filtered["documents"])
 
# Count total stored memories
print(f"Total memories: {collection.count()}")

Vector Database and Retrieval Approaches

Vector databases are the primary infrastructure for agent long-term memory, storing embedding representations for semantic similarity search:

Pinecone is a fully managed, serverless vector database optimized for production-scale agent systems. It supports hybrid search (dense + sparse vectors), metadata filtering, and namespaces for multi-tenant isolation. Widely used in enterprise RAG pipelines.

Weaviate is an open-source vector database with built-in vectorization modules, hybrid search combining BM25 and vector similarity, and a GraphQL API. It supports generative search where retrieved results are fed directly to LLMs for answer synthesis.

Chroma is a lightweight, local-first embedding database popular for prototyping and development. Its simplicity and Python-native API make it the default choice for LangChain and LlamaIndex tutorials, though it scales to production with client-server mode.

Milvus is a distributed vector database designed for billion-scale datasets. It supports multiple index types (FAISS, HNSW, ScaNN), GPU acceleration, and dynamic schema. Used by major tech companies for production search and recommendation.

All of these support the core approximate nearest neighbor search operations and MIPS that underpin retrieval-augmented generation.

Memory Consolidation and Forgetting

Effective long-term memory requires not just storage but active management:

Memory Consolidation involves promoting important short-term observations to long-term storage. MemGPT/Letta agents self-edit their memory, deciding what to archive from the conversation context. Google's Memory Bank (2025) automates extraction of key facts from agent interactions, storing them persistently via REST API. KAIROS, a background daemon within the Claude Code architecture also known as 'Dream Mode,' represents an alternative consolidation approach inspired by human sleep patterns. KAIROS performs reflective passes over memory files after 24 hours of inactivity to synthesize learnings into durable, well-organized data, maintaining a lightweight index for future agent sessions²⁾.

Forgetting and Pruning prevents unbounded memory growth and removes stale information. Strategies include time-decay functions that deprioritize older memories, LRU (least recently used) eviction, attention-weighted persistence scoring, and Ebbinghaus forgetting curves. SAGE (Liang et al., 2024)³⁾ pioneered reflection-based forgetting using Ebbinghaus curves to balance retention and pruning.

Contradiction Resolution addresses conflicting information accumulated over time. Zep's temporal knowledge graphs track when facts were established and updated, enabling the agent to prefer newer information when contradictions arise.

Integration Patterns in Agent Architectures

Several frameworks provide integrated long-term memory for agents:

MemGPT (Packer et al., 2023⁴⁾ treats the LLM as a processor with an OS-like memory hierarchy: core memory (always in context, like RAM), recall memory (searchable conversation history, like disk cache), and archival memory (long-term vector storage). Agents use tool calls to page memories in and out, self-editing their persistent state.

Letta (evolved from MemGPT) extends this with cloud sync for cross-session agent persistence, a production-ready server, and multi-agent memory sharing. It represents the most complete implementation of hierarchical agent memory.

Zep provides low-latency (<200ms) memory with temporal knowledge graphs via the Graphiti engine. It scores 63.8% on LongMemEval and supports SOC2/HIPAA compliance. The open-source Graphiti library has ~24K GitHub stars.

Mem0 offers a lightweight memory layer that extracts atomic facts from conversations, scoped by user, session, or agent. The Pro tier adds knowledge graph support via Neo4j. It integrates with any LLM application and uses Qdrant or Chroma as vector backends.

Google Memory Bank (2025) provides framework-agnostic persistent memory for agents via the Agent Development Kit (ADK), automatically extracting and recalling information without manual memory management code.

References

¹⁾

The Neuron Daily - ChatGPT Gets a $100 Tier

²⁾

Source: Anthropic's Code Leak Article

³⁾

Liang et al. "SAGE." arXiv:2409.00872, 2024.

⁴⁾

Packer et al. "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560, 2023.

AI Agent Knowledge Base

Sidebar

Table of Contents

Long-Term Memory

Types of Long-Term Memory Systems

Persistent Context and Project-Scoped Memory

Python Example: ChromaDB Vector Store for Agent Memory

Vector Database and Retrieval Approaches

Memory Consolidation and Forgetting

Integration Patterns in Agent Architectures

See Also

References

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Long-Term Memory

Types of Long-Term Memory Systems

Persistent Context and Project-Scoped Memory

Python Example: ChromaDB Vector Store for Agent Memory

Vector Database and Retrieval Approaches

Memory Consolidation and Forgetting

Integration Patterns in Agent Architectures

See Also

References

Page Tools