====== How to Add Memory to an Agent ======

A practical guide to implementing memory systems for AI agents. Memory allows agents to maintain context across conversations, personalize responses, and learn from past interactions.

===== Memory Architecture Overview =====

<mermaid>
graph TB
    subgraph Short-Term
        A[Conversation Buffer] --> C[Working Memory]
        B[Sliding Window] --> C
    end
    subgraph Long-Term
        D[Vector Store Memory]
        E[Knowledge Graph Memory]
        F[SQL/Structured Store]
    end
    C -->|Summarize & Persist| D
    C -->|Extract Entities| E
    C -->|Store Facts| F
    G[User Query] --> H[Memory Router]
    H --> C
    H --> D
    H --> E
    D --> I[Context for LLM]
    E --> I
    C --> I
    I --> J[Agent Response]
</mermaid>

===== Memory Types Compared =====

^ Memory Type ^ How It Works ^ Best For ^ Trade-offs ^
| **Conversation Buffer** | Stores full recent conversation | Simple chatbots, short sessions | High token usage, forgets old context |
| **Sliding Window** | Fixed-size window of recent N turns | Ongoing sessions needing recency | Fixed size limits depth |
| **Vector Store** | Embeds interactions, retrieves by similarity | Semantic recall over large histories | Weak on relationships, embedding cost |
| **Knowledge Graph** | Entities as nodes, relations as edges | Multi-hop reasoning, complex domains | Setup complexity, graph query overhead |

===== When to Use Which =====

<mermaid>
graph TD
    A[Need memory?] --> B{Session length?}
    B -->|Short, < 10 turns| C[Conversation Buffer]
    B -->|Medium, 10-50 turns| D[Sliding Window]
    B -->|Long / Cross-session| E{What kind of recall?}
    E -->|Fuzzy / Semantic| F[Vector Store Memory]
    E -->|Relational / Structured| G[Knowledge Graph Memory]
    E -->|Both| H[Hybrid: Vector + Graph]
    F --> I[Use Mem0 or Qdrant]
    G --> J[Use Neo4j]
    H --> K[Use Mem0 with graph mode]
</mermaid>

===== Approach 1: Conversation Buffer Memory =====

The simplest form — store the full conversation history and pass it as context.

<code python>
from collections import deque
from openai import OpenAI

client = OpenAI()

class ConversationBufferMemory:
    def __init__(self):
        self.messages = []

    def add_user_message(self, content: str):
        self.messages.append({"role": "user", "content": content})

    def add_assistant_message(self, content: str):
        self.messages.append({"role": "assistant", "content": content})

    def get_context(self) -> list:
        return self.messages.copy()

    def clear(self):
        self.messages = []

memory = ConversationBufferMemory()

def chat(user_input: str) -> str:
    memory.add_user_message(user_input)
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."}
        ] + memory.get_context()
    )
    reply = response.choices[0].message.content
    memory.add_assistant_message(reply)
    return reply

# Usage
print(chat("My name is Alice"))
print(chat("What is my name?"))  # Remembers: "Alice"
</code>

===== Approach 2: Sliding Window Memory =====

Keep only the last N exchanges to control token usage while maintaining recent context.

<code python>
from collections import deque
from openai import OpenAI

client = OpenAI()

class SlidingWindowMemory:
    def __init__(self, window_size: int = 10):
        self.window = deque(maxlen=window_size * 2)  # *2 for user+assistant pairs

    def add_user_message(self, content: str):
        self.window.append({"role": "user", "content": content})

    def add_assistant_message(self, content: str):
        self.window.append({"role": "assistant", "content": content})

    def get_context(self) -> list:
        return list(self.window)

memory = SlidingWindowMemory(window_size=5)  # Keep last 5 exchanges

def chat(user_input: str) -> str:
    memory.add_user_message(user_input)
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."}
        ] + memory.get_context()
    )
    reply = response.choices[0].message.content
    memory.add_assistant_message(reply)
    return reply
</code>

===== Approach 3: Vector Store Memory =====

Embed past interactions and retrieve the most relevant ones via semantic search. Scales to thousands of past conversations.

<code python>
from openai import OpenAI
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import uuid

client = OpenAI()
qdrant = QdrantClient(":memory:")  # Use url="http://localhost:6333" for production

# Create collection
qdrant.create_collection(
    collection_name="agent_memory",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

def embed(text: str) -> list:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def store_memory(text: str, metadata: dict = None):
    vector = embed(text)
    point = PointStruct(
        id=str(uuid.uuid4()),
        vector=vector,
        payload={"text": text, **(metadata or {})}
    )
    qdrant.upsert(collection_name="agent_memory", points=[point])

def recall(query: str, top_k: int = 3) -> list[str]:
    vector = embed(query)
    results = qdrant.search(
        collection_name="agent_memory",
        query_vector=vector,
        limit=top_k
    )
    return [hit.payload["text"] for hit in results]

# Store interactions
store_memory("User prefers dark mode and works in fintech", {"user_id": "user123"})
store_memory("User's project deadline is March 2026", {"user_id": "user123"})
store_memory("User likes Python over JavaScript", {"user_id": "user123"})

# Recall relevant memories
memories = recall("What programming language does the user prefer?")
print(memories)  # Returns the Python preference memory

# Use in agent loop
def chat_with_memory(user_input: str) -> str:
    relevant_memories = recall(user_input, top_k=3)
    context = "\n".join(relevant_memories)
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"You are a helpful assistant. Relevant memories:\n{context}"},
            {"role": "user", "content": user_input}
        ]
    )
    reply = response.choices[0].message.content
    store_memory(f"User: {user_input}\nAssistant: {reply}")
    return reply
</code>

===== Approach 4: Knowledge Graph Memory =====

Model memories as entities and relationships for structured, multi-hop reasoning.

<code python>
from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    "bolt://localhost:7687",
    auth=("neo4j", "password")
)

def add_entity_relation(entity1: str, relation: str, entity2: str):
    with driver.session() as session:
        session.run(
            "MERGE (a:Entity {name: $e1}) "
            "MERGE (b:Entity {name: $e2}) "
            "MERGE (a)-[:RELATES {type: $rel}]->(b)",
            e1=entity1, e2=entity2, rel=relation
        )

def query_relations(entity: str, relation_type: str = None) -> list:
    with driver.session() as session:
        if relation_type:
            result = session.run(
                "MATCH (a:Entity {name: $name})-[r:RELATES {type: $rel}]->(b) "
                "RETURN b.name AS target",
                name=entity, rel=relation_type
            )
        else:
            result = session.run(
                "MATCH (a:Entity {name: $name})-[r:RELATES]->(b) "
                "RETURN r.type AS relation, b.name AS target",
                name=entity
            )
        return [dict(record) for record in result]

# Build knowledge graph from interactions
add_entity_relation("Alice", "WORKS_IN", "Fintech")
add_entity_relation("Alice", "PREFERS", "Dark Mode")
add_entity_relation("Alice", "USES", "Python")
add_entity_relation("Alice", "MANAGES", "Project Alpha")
add_entity_relation("Project Alpha", "DEADLINE", "March 2026")

# Multi-hop query
relations = query_relations("Alice")
print(relations)
# [{'relation': 'WORKS_IN', 'target': 'Fintech'},
#  {'relation': 'PREFERS', 'target': 'Dark Mode'}, ...]
</code>

===== Using Mem0 (Production-Ready Memory Layer) =====

Mem0 provides a managed memory layer that combines vector, graph, and SQL storage with automatic summarization.

<code bash>
pip install mem0ai
</code>

<code python>
from mem0 import MemoryClient
import os

os.environ["OPENAI_API_KEY"] = "your-key"

client = MemoryClient(api_key="your-mem0-api-key")

# Add memories (auto-categorized: episodic, semantic, procedural)
client.add("User prefers dark mode and works in fintech.", user_id="user123")
client.add("User reviewed product X for project Y.", user_id="user123")

# Semantic search across all memory types
memories = client.search(query="user preferences", user_id="user123")
for m in memories:
    print(f"[{m['score']:.2f}] {m['text']}")

# Use in agent loop
def agent_with_mem0(query: str, user_id: str) -> str:
    # Retrieve relevant memories
    memories = client.search(query=query, user_id=user_id)
    context = "\n".join([m["text"] for m in memories])

    # Generate response with memory context
    from openai import OpenAI
    llm = OpenAI()
    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"Known about user:\n{context}"},
            {"role": "user", "content": query}
        ]
    )
    reply = response.choices[0].message.content

    # Store the new interaction
    client.add(f"Q: {query} A: {reply}", user_id=user_id)
    return reply
</code>

===== Using Letta (Stateful Agent Framework) =====

Letta provides persistent agents with built-in memory management, including episodic and archival memory tiers.

<code bash>
pip install letta
</code>

<code python>
from letta import create_client

# Connect to Letta server (run: letta server)
client = create_client()

# Create an agent with built-in memory management
agent_state = client.create_agent(
    name="memory-agent",
    memory_human="User is a developer working on AI projects.",
    memory_persona="I am a helpful assistant with long-term memory."
)

# Chat — Letta auto-manages memory tiers
response = client.send_message(
    agent_id=agent_state.id,
    message="I prefer Python and work in fintech."
)
print(response.messages)

# Memory persists across sessions automatically
response = client.send_message(
    agent_id=agent_state.id,
    message="What do you know about me?"
)
# Agent recalls preferences from previous interaction
</code>

===== Hybrid Architecture: Best of All Worlds =====

For production systems, combine multiple memory types:

<code python>
from dataclasses import dataclass, field

@dataclass
class HybridMemory:
    buffer: list = field(default_factory=list)       # Short-term
    window_size: int = 10
    vector_store: object = None   # Qdrant/Chroma for semantic recall
    graph_store: object = None    # Neo4j for relational recall

    def add_interaction(self, user_msg: str, assistant_msg: str):
        # 1. Add to buffer (short-term)
        self.buffer.append({"user": user_msg, "assistant": assistant_msg})
        if len(self.buffer) > self.window_size:
            # 2. Summarize and store overflow in vector store
            overflow = self.buffer.pop(0)
            summary = f"User asked: {overflow['user']}. Response: {overflow['assistant']}"
            self.vector_store.store_memory(summary)

        # 3. Extract entities for knowledge graph
        # Use NER or LLM to extract (entity, relation, entity) triples
        # self.graph_store.add_entity_relation(...)

    def get_context(self, query: str) -> str:
        # Combine all memory sources
        recent = "\n".join([
            f"User: {m['user']}\nAssistant: {m['assistant']}"
            for m in self.buffer[-3:]
        ])
        semantic = "\n".join(self.vector_store.recall(query, top_k=3))
        # graph = self.graph_store.query_relations(...)
        return f"Recent:\n{recent}\n\nRelevant past:\n{semantic}"
</code>

===== Best Practices =====

  * **Start simple** — Use conversation buffer for prototypes, upgrade to vector/graph as needed
  * **Scope memories** — Isolate by user_id, session_id, or agent_id to prevent cross-contamination
  * **Add decay** — Score memories by recency and relevance; auto-forget low-scoring entries
  * **Summarize** — Periodically summarize old memories to reduce storage and improve retrieval
  * **Compliance** — Encrypt stored memories; implement GDPR/HIPAA deletion on request
  * **Test recall** — Regularly verify that the right memories surface for given queries

===== See Also =====

  * [[how_to_build_a_rag_pipeline|How to Build a RAG Pipeline]]
  * [[how_to_deploy_an_agent|How to Deploy an Agent]]
  * [[how_to_evaluate_an_agent|How to Evaluate an Agent]]

{{tag>memory agents vector-store knowledge-graph mem0 letta how-to}}