Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Persisting agent state across sessions using database backends, file storage, vector store persistence, and conversation serialization. Comparison of approaches with implementation patterns.
Every time you start a new session with an LLM agent, it wakes up blank. It does not remember preferences, project context, decisions from last week, or anything from prior sessions. This “goldfish memory” problem is the single biggest practical limitation of LLM agents for real-world use.
Memory persistence solves this by storing agent state durably and retrieving it at session start. The right approach depends on your scale, query patterns, and whether you need semantic retrieval or structured lookups.
| Memory Type | Purpose | Retention | Example |
|---|---|---|---|
| Short-Term | Current conversation context | Session | Recent messages in chat |
| Long-Term | User preferences, learned facts | Indefinite | “User prefers Python over JS” |
| Episodic | Past interactions and task history | Time-decayed | “Fixed bug X on March 15” |
| Semantic | Domain knowledge and relationships | Indefinite | Entity relationships, concepts |
| Criterion | Database (Redis/PG/SQLite) | File Storage | Vector Store | Serialization |
|---|---|---|---|---|
| Durability | High (ACID for PG/SQLite) | Medium (filesystem) | Medium (backend-dependent) | Low (app-level) |
| Scalability | High (sharding/clustering) | Low (file limits) | High (distributed) | Medium |
| Query Speed | 1-50ms (indexed) | 1-100ms (file ops) | 10-50ms (similarity) | N/A (full load) |
| Semantic Search | No (without extension) | No | Yes | No |
| Complexity | Medium | Low | Medium | Low |
| Best For | Structured state, multi-agent | Episodic logs, prototypes | Semantic recall | Quick prototypes |
Best for production systems needing both structured queries and semantic search.
Best for high-speed session state and short-term memory caching.
Best for single-agent systems, prototypes, and embedded deployments.
import json from datetime import datetime, timezone from typing import Optional import asyncpg class AgentMemoryStore: """Persistent agent memory using PostgreSQL with pgvector.""" def __init__(self, dsn: str): self.dsn = dsn self.pool = None async def initialize(self): self.pool = await asyncpg.create_pool(self.dsn) async with self.pool.acquire() as conn: await conn.execute(""" CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE IF NOT EXISTS agent_memories ( id SERIAL PRIMARY KEY, agent_id TEXT NOT NULL, memory_type TEXT NOT NULL, content TEXT NOT NULL, embedding vector(1536), metadata JSONB DEFAULT '{}', created_at TIMESTAMPTZ DEFAULT NOW(), accessed_at TIMESTAMPTZ DEFAULT NOW(), relevance_score FLOAT DEFAULT 1.0 ); CREATE INDEX IF NOT EXISTS idx_memories_agent ON agent_memories(agent_id, memory_type); CREATE INDEX IF NOT EXISTS idx_memories_embedding ON agent_memories USING ivfflat (embedding vector_cosine_ops); """) async def store( self, agent_id: str, content: str, memory_type: str = "long_term", embedding: Optional[list[float]] = None, metadata: Optional[dict] = None, ): async with self.pool.acquire() as conn: await conn.execute( """INSERT INTO agent_memories (agent_id, memory_type, content, embedding, metadata) VALUES ($1, $2, $3, $4, $5)""", agent_id, memory_type, content, embedding, json.dumps(metadata or {}), ) async def recall_semantic( self, agent_id: str, query_embedding: list[float], limit: int = 10, score_threshold: float = 0.7, ) -> list[dict]: async with self.pool.acquire() as conn: rows = await conn.fetch( """SELECT content, metadata, 1 - (embedding <=> $2) AS similarity FROM agent_memories WHERE agent_id = $1 AND 1 - (embedding <=> $2) > $3 ORDER BY similarity DESC LIMIT $4""", agent_id, query_embedding, score_threshold, limit, ) return [dict(r) for r in rows] async def recall_recent( self, agent_id: str, memory_type: str, limit: int = 20 ) -> list[dict]: async with self.pool.acquire() as conn: rows = await conn.fetch( """SELECT content, metadata, created_at FROM agent_memories WHERE agent_id = $1 AND memory_type = $2 ORDER BY created_at DESC LIMIT $3""", agent_id, memory_type, limit, ) return [dict(r) for r in rows]
The simplest approach: store memory as human-readable markdown files. Used by soul.py and Claude's MEMORY.md pattern.
Structure: SOUL.md (identity/persona) + MEMORY.md (curated long-term facts) + memory/YYYY-MM-DD.md (daily session logs).
Persist embeddings for semantic retrieval using Qdrant, Chroma, pgvector, or similar.
Dump full chat histories or agent states to JSON/YAML for reload.
graph.get_state(checkpoint_id)