====== Long-Term Memory ====== Long-term memory in AI agent systems refers to persistent storage mechanisms that allow agents to retain and retrieve information across sessions and over extended time periods. Unlike [[short_term_memory|short-term memory]] constrained by context window limits, long-term memory leverages external storage systems such as vector databases, [[knowledge_graphs|knowledge graphs]], or structured datastores to maintain an effectively unbounded repository of past experiences, learned facts, and procedural knowledge. This capability is critical for building agents that can accumulate knowledge, personalize their behavior, and avoid repeating past mistakes. graph TD Agent[Agent] --> Emb[Embedding Model] Emb --> VS[Vector Store] VS --> Ret[Retrieval] Ret --> Ctx[Context Window] Agent --> KG[Knowledge Graph] KG --> Ret Ctx --> Agent style Agent fill:#69f,stroke:#333 style VS fill:#f96,stroke:#333 style KG fill:#f96,stroke:#333 ===== Types of Long-Term Memory Systems ===== Long-term memory for agents can be categorized following cognitive science distinctions: **Episodic Memory** stores records of specific past experiences and interactions, typically as timestamped sequences. An agent's episodic memory might contain conversation logs, task execution traces, or records of past decisions and their outcomes. Systems like Zep specialize in episodic memory with real-time summarization and temporal [[knowledge_graphs|knowledge graphs]]. Episodic memory enables agents to answer questions like "What did the user ask about last week?" or "How did I solve this problem before?" **Semantic Memory** stores general world knowledge, facts, and concepts abstracted from specific experiences. This maps to [[explicit_memory|explicit/declarative memory]] and includes structured knowledge bases, entity relationships, and domain facts. [[mem0|Mem0]] excels at extracting atomic facts from conversations and storing them as user-scoped semantic memories. **Procedural Memory** encodes learned skills and behavioral patterns, corresponding to [[implicit_memory|implicit memory]]. In agent systems, this manifests as fine-tuned [[modelweights|model weights]], learned tool-use patterns, and optimized prompt templates. ===== Persistent Context and Project-Scoped Memory ===== A complementary approach to long-term memory involves organizing information around persistent contexts or projects rather than purely by memory type. **Persistent context** groups chats, files, and custom instructions into dedicated containers—often called notebooks or homes—that persist across sessions and prevent AI systems from losing project-specific details(([[https://www.theneurondaily.com/p/chatgpt-gets-a-100-tier|The Neuron Daily - ChatGPT Gets a $100 Tier]])). This pattern enables users to build on ongoing research, side projects, and collaborative work without requiring repetitive explanations of context. [[google|Google]]'s Gemini implements persistent context through notebooks that sync across applications like [[notebooklm|NotebookLM]], allowing seamless continuation of work across different tools and sessions. Unlike traditional retrieval-based memory that must be queried, persistent context is always implicitly available to the agent within its designated namespace, reducing cognitive load on both the agent and user. ===== Python Example: ChromaDB Vector Store for Agent Memory =====


import [[chromadb|chromadb]]
from datetime import datetime

# Initialize a persistent [[chromadb|ChromaDB]] client (data survives restarts)
client = [[chromadb|chromadb]].PersistentClient(path="./agent_memory_db")

# Create a collection for agent long-term memory
collection = client.get_or_create_collection(
    name="agent_memory",
    metadata={"hnsw:space": "cosine"},  # use cosine similarity
)

# Store memories with metadata for filtering
memories = [
    ("User prefers concise answers with code examples.", {"type": "preference", "user": "alice"}),
    ("Deployed the API to production on 2025-03-15.", {"type": "event", "user": "alice"}),
    ("The database schema uses UUID primary keys.", {"type": "fact", "user": "alice"}),
    ("User asked about HNSW indexing for vector search.", {"type": "topic", "user": "alice"}),
]

for i, (text, [[meta|meta]]) in enumerate(memories):
    [[meta|meta]]["timestamp"] = datetime.now().isoformat()
    collection.add(
        documents=[text],
        metadatas=[[[meta|meta]]],
        ids=[f"mem_{i}"],
    )

# Retrieve relevant memories by semantic similarity
results = collection.query(
    query_texts=["What does the user like in responses?"],
    n_results=2,
)
print("Retrieved:", results["documents"])

# Filter by metadata (e.g., only preferences)
filtered = collection.query(
    query_texts=["user settings"],
    n_results=3,
    where={"type": "preference"},
)
print("Preferences:", filtered["documents"])

# Count total stored memories
print(f"Total memories: {collection.count()}")

===== Vector Database and Retrieval Approaches ===== Vector databases are the primary infrastructure for agent long-term memory, storing embedding representations for semantic similarity search: **[[pinecone|Pinecone]]** is a fully managed, serverless vector database optimized for production-scale agent systems. It supports [[hybrid_search|hybrid search]] (dense + sparse vectors), metadata filtering, and namespaces for multi-tenant isolation. Widely used in enterprise RAG pipelines. **[[weaviate|Weaviate]]** is an open-source vector database with built-in vectorization modules, [[hybrid_search|hybrid search]] combining BM25 and vector similarity, and a GraphQL API. It supports generative search where retrieved results are fed directly to LLMs for answer synthesis. **Chroma** is a lightweight, local-first embedding database popular for prototyping and development. Its simplicity and Python-native API make it the default choice for [[langchain|LangChain]] and [[llamaindex|LlamaIndex]] tutorials, though it scales to production with client-server mode. **[[milvus|Milvus]]** is a distributed vector database designed for billion-scale datasets. It supports multiple index types ([[faiss|FAISS]], [[hnsw_graphs|HNSW]], [[scann|ScaNN]]), GPU acceleration, and dynamic schema. Used by major tech companies for production search and recommendation. All of these support the core [[approximate_nearest_neighbors|approximate nearest neighbor]] search operations and [[maximum_inner_product_search|MIPS]] that underpin retrieval-augmented generation. ===== Memory Consolidation and Forgetting ===== Effective long-term memory requires not just storage but active management: **Memory Consolidation** involves promoting important short-term observations to long-term storage. MemGPT/[[letta|Letta]] agents self-edit their memory, deciding what to archive from the conversation context. Google's Memory Bank (2025) automates extraction of key facts from agent interactions, storing them persistently via REST API. KAIROS, a background daemon within the [[claude_code|Claude Code]] architecture also known as 'Dream Mode,' represents an alternative consolidation approach inspired by human sleep patterns. KAIROS performs reflective passes over memory files after 24 hours of inactivity to synthesize learnings into durable, well-organized data, maintaining a lightweight index for future agent sessions(([[https://alphasignalai.substack.com/p/anthropics-512k-line-code-leak-reveals|Source: Anthropic's Code Leak Article]])). **Forgetting and Pruning** prevents unbounded memory growth and removes stale information. Strategies include time-decay functions that deprioritize older memories, LRU (least recently used) eviction, attention-weighted persistence scoring, and Ebbinghaus forgetting curves. [[https://arxiv.org/abs/2409.00872|SAGE (Liang et al., 2024)]](([[https://arxiv.org/abs/2409.00872|Liang et al. "SAGE." arXiv:2409.00872, 2024.]])) pioneered reflection-based forgetting using Ebbinghaus curves to balance retention and pruning. **Contradiction Resolution** addresses conflicting information accumulated over time. Zep's temporal [[knowledge_graphs|knowledge graphs]] track when facts were established and updated, enabling the agent to prefer newer information when contradictions arise. ===== Integration Patterns in Agent Architectures ===== Several frameworks provide integrated long-term memory for agents: **MemGPT** ([[https://arxiv.org/abs/2310.08560|Packer et al., 2023]](([[https://arxiv.org/abs/2310.08560|Packer et al. "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560, 2023.]])) treats the LLM as a processor with an OS-like memory hierarchy: core memory (always in context, like RAM), recall memory (searchable conversation history, like disk cache), and archival memory (long-term vector storage). Agents use tool calls to page memories in and out, self-editing their persistent state. **[[letta|Letta]]** (evolved from MemGPT) extends this with cloud sync for cross-session agent persistence, a production-ready server, and multi-agent memory sharing. It represents the most complete implementation of hierarchical agent memory. **Zep** provides low-latency (<200ms) memory with temporal [[knowledge_graphs|knowledge graphs]] via the Graphiti engine. It scores 63.8% on LongMemEval and supports SOC2/HIPAA compliance. The open-source Graphiti library has ~24K [[github|GitHub]] stars. **[[mem0|Mem0]]** offers a lightweight memory layer that extracts atomic facts from conversations, scoped by user, session, or agent. The Pro tier adds knowledge graph support via [[neo4j|Neo4j]]. It integrates with any LLM application and uses [[qdrant|Qdrant]] or Chroma as vector backends. **[[google|Google]] Memory Bank** (2025) provides framework-agnostic persistent memory for agents via the Agent Development Kit (ADK), automatically extracting and recalling information without manual memory management code. ===== See Also ===== * [[memory_retention|Memory Retention]] * [[agent_memory_frameworks|Agent Memory Frameworks]] * [[file_system_memory|File System-Based Memory for Agents]] * [[how_to_add_memory_to_an_agent|How to Add Memory to an Agent]] * [[short_term_memory|Short-Term Memory]] ===== References =====