====== Agent Memory Persistence ======
Persisting agent state across sessions using database backends, file storage, vector store persistence, and conversation serialization. Comparison of approaches with implementation patterns.
===== Overview =====
Every time you start a new session with an LLM agent, it wakes up blank. It does not remember preferences, project context, decisions from last week, or anything from prior sessions. This "goldfish memory" problem is the single biggest practical limitation of LLM agents for real-world use.
Memory persistence solves this by storing agent state durably and retrieving it at session start. The right approach depends on your scale, query patterns, and whether you need semantic retrieval or structured lookups.
===== Types of Agent Memory =====
^ Memory Type ^ Purpose ^ Retention ^ Example ^
| Short-Term | Current conversation context | Session | Recent messages in chat |
| Long-Term | User preferences, learned facts | Indefinite | "User prefers Python over JS" |
| Episodic | Past interactions and task history | Time-decayed | "Fixed bug X on March 15" |
| Semantic | Domain knowledge and relationships | Indefinite | Entity relationships, concepts |
===== Memory Architecture =====
graph TD
A[Agent Session Start] --> B[Load Memory]
B --> C[Short-Term: Conversation Buffer]
B --> D[Long-Term: Database/File]
B --> E[Semantic: Vector Store]
F[Agent Processing] --> G{Memory Write?}
G -->|New Fact| D
G -->|Embedding| E
G -->|Context| C
H[Agent Session End] --> I[Persist Short-Term Summary]
I --> D
I --> E
subgraph Retrieval at Query Time
J[User Query] --> K[Embed Query]
K --> L[Vector Similarity Search]
L --> M[Ranked Memories]
J --> N[Key-Value Lookup]
N --> O[Structured Facts]
M --> P[Inject into Prompt]
O --> P
end
===== Approach Comparison =====
^ Criterion ^ Database (Redis/PG/SQLite) ^ File Storage ^ Vector Store ^ Serialization ^
| Durability | High (ACID for PG/SQLite) | Medium (filesystem) | Medium (backend-dependent) | Low (app-level) |
| Scalability | High (sharding/clustering) | Low (file limits) | High (distributed) | Medium |
| Query Speed | 1-50ms (indexed) | 1-100ms (file ops) | 10-50ms (similarity) | N/A (full load) |
| Semantic Search | No (without extension) | No | Yes | No |
| Complexity | Medium | Low | Medium | Low |
| Best For | Structured state, multi-agent | Episodic logs, prototypes | Semantic recall | Quick prototypes |
===== Database Backends =====
==== PostgreSQL with pgvector ====
Best for production systems needing both structured queries and semantic search.
* ACID transactions for reliable state persistence
* pgvector extension enables vector similarity search alongside relational queries
* Auditable with full query logging
* Scales well with connection pooling and read replicas
==== Redis ====
Best for high-speed session state and short-term memory caching.
* In-memory speed: 1-10ms reads/writes
* Supports vectors via RediSearch module
* Pub/sub for real-time multi-agent coordination
* Configure AOF persistence for durability
==== SQLite ====
Best for single-agent systems, prototypes, and embedded deployments.
* Serverless, zero configuration
* Full SQL support in a single file
* Limited concurrency (file locking)
* Scales to ~10K records comfortably
===== Implementation: Database-Backed Memory =====
import json
from datetime import datetime, timezone
from typing import Optional
import asyncpg
class AgentMemoryStore:
"""Persistent agent memory using PostgreSQL with pgvector."""
def __init__(self, dsn: str):
self.dsn = dsn
self.pool = None
async def initialize(self):
self.pool = await asyncpg.create_pool(self.dsn)
async with self.pool.acquire() as conn:
await conn.execute("""
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS agent_memories (
id SERIAL PRIMARY KEY,
agent_id TEXT NOT NULL,
memory_type TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
accessed_at TIMESTAMPTZ DEFAULT NOW(),
relevance_score FLOAT DEFAULT 1.0
);
CREATE INDEX IF NOT EXISTS idx_memories_agent
ON agent_memories(agent_id, memory_type);
CREATE INDEX IF NOT EXISTS idx_memories_embedding
ON agent_memories USING ivfflat (embedding vector_cosine_ops);
""")
async def store(
self,
agent_id: str,
content: str,
memory_type: str = "long_term",
embedding: Optional[list[float]] = None,
metadata: Optional[dict] = None,
):
async with self.pool.acquire() as conn:
await conn.execute(
"""INSERT INTO agent_memories
(agent_id, memory_type, content, embedding, metadata)
VALUES ($1, $2, $3, $4, $5)""",
agent_id, memory_type, content, embedding,
json.dumps(metadata or {}),
)
async def recall_semantic(
self,
agent_id: str,
query_embedding: list[float],
limit: int = 10,
score_threshold: float = 0.7,
) -> list[dict]:
async with self.pool.acquire() as conn:
rows = await conn.fetch(
"""SELECT content, metadata,
1 - (embedding <=> $2) AS similarity
FROM agent_memories
WHERE agent_id = $1
AND 1 - (embedding <=> $2) > $3
ORDER BY similarity DESC
LIMIT $4""",
agent_id, query_embedding, score_threshold, limit,
)
return [dict(r) for r in rows]
async def recall_recent(
self, agent_id: str, memory_type: str, limit: int = 20
) -> list[dict]:
async with self.pool.acquire() as conn:
rows = await conn.fetch(
"""SELECT content, metadata, created_at
FROM agent_memories
WHERE agent_id = $1 AND memory_type = $2
ORDER BY created_at DESC LIMIT $3""",
agent_id, memory_type, limit,
)
return [dict(r) for r in rows]
===== File-Based Memory =====
The simplest approach: store memory as human-readable markdown files. Used by soul.py and Claude's MEMORY.md pattern.
* Zero dependencies, human-readable, easy to debug
* Files organized by date or topic
* Agent reads memory files at session start, writes updates as it works
* Limited concurrency and no semantic search without additional tooling
Structure: ''SOUL.md'' (identity/persona) + ''MEMORY.md'' (curated long-term facts) + ''memory/YYYY-MM-DD.md'' (daily session logs).
===== Vector Store Persistence =====
Persist embeddings for semantic retrieval using Qdrant, Chroma, pgvector, or similar.
* Store facts extracted from conversations as embeddings
* Retrieve relevant memories via similarity search at query time
* Combine with structured storage for hybrid retrieval
* Frameworks like Mem0 automate fact extraction from conversations and store in vector backends
===== Conversation Serialization =====
Dump full chat histories or agent states to JSON/YAML for reload.
* Simple and portable -- works with any storage backend
* Bloats context without indexing or summarization
* LangGraph uses checkpoint-based serialization: ''graph.get_state(checkpoint_id)''
* Best used with summarization to compress old conversations before storage
===== Memory Lifecycle Management =====
* **Temporal decay** -- Reduce relevance scores over time; forget stale memories
* **Consolidation** -- Periodically summarize episodic memories into long-term facts
* **Deduplication** -- Merge similar memories to prevent bloat
* **Capacity limits** -- Set maximum memory counts per agent and evict by relevance
* **Privacy** -- Implement deletion APIs for user data removal (GDPR compliance)
===== Frameworks =====
* **Mem0** -- Automated memory extraction, vector + graph storage, multi-provider support
* **LangGraph** -- Checkpoint-based persistence with pluggable backends (SQLite, PostgreSQL, Redis)
* **soul.py** -- File-based memory with markdown files, zero dependencies
* **CrewAI / AutoGen** -- Message list serialization with configurable storage
===== References =====
* [[https://sparkco.ai/blog/persistent-memory-for-ai-agents-comparing-pag-memorymd-and-sqlite-approaches|Comparing PAG, MEMORY.md, and SQLite Approaches]]
* [[https://vectorize.io/articles/best-ai-agent-memory-systems|Best AI Agent Memory Systems]]
* [[https://47billion.com/blog/ai-agent-memory-types-implementation-best-practices/|AI Agent Memory: Types, Implementation, Best Practices]]
* [[https://dev.to/foxgem/ai-agent-memory-a-comparative-analysis-of-langgraph-crewai-and-autogen-31dp|Comparative Analysis: LangGraph, CrewAI, AutoGen]]
* [[https://themenonlab.blog/blog/soul-py-persistent-memory-llm-agents-guide|soul.py: Persistent Memory for LLM Agents]]
* [[https://oneuptime.com/blog/post/2026-01-30-agent-memory/view|How to Create Agent Memory]]
===== See Also =====
* [[agent_error_recovery|Agent Error Recovery]]
* [[tool_result_parsing|Tool Result Parsing]]
* [[agent_context_management|Agent Context Management]]