Agent Memory Persistence

Persisting agent state across sessions using database backends, file storage, vector store persistence, and conversation serialization. Comparison of approaches with implementation patterns.

Overview

Every time you start a new session with an LLM agent, it wakes up blank. It does not remember preferences, project context, decisions from last week, or anything from prior sessions. This “goldfish memory” problem is the single biggest practical limitation of LLM agents for real-world use.

Memory persistence solves this by storing agent state durably and retrieving it at session start. The right approach depends on your scale, query patterns, and whether you need semantic retrieval or structured lookups.

Types of Agent Memory

Memory Type	Purpose	Retention	Example
Short-Term	Current conversation context	Session	Recent messages in chat
Long-Term	User preferences, learned facts	Indefinite	“User prefers Python over JS”
Episodic	Past interactions and task history	Time-decayed	“Fixed bug X on March 15”
Semantic	Domain knowledge and relationships	Indefinite	Entity relationships, concepts

Memory Architecture

graph TD A[Agent Session Start] --> B[Load Memory] B --> C[Short-Term: Conversation Buffer] B --> D[Long-Term: Database/File] B --> E[Semantic: Vector Store] F[Agent Processing] --> G{Memory Write?} G -->|New Fact| D G -->|Embedding| E G -->|Context| C H[Agent Session End] --> I[Persist Short-Term Summary] I --> D I --> E subgraph Retrieval at Query Time J[User Query] --> K[Embed Query] K --> L[Vector Similarity Search] L --> M[Ranked Memories] J --> N[Key-Value Lookup] N --> O[Structured Facts] M --> P[Inject into Prompt] O --> P end

Approach Comparison

Criterion	Database (Redis/PG/SQLite)	File Storage	Vector Store	Serialization
Durability	High (ACID for PG/SQLite)	Medium (filesystem)	Medium (backend-dependent)	Low (app-level)
Scalability	High (sharding/clustering)	Low (file limits)	High (distributed)	Medium
Query Speed	1-50ms (indexed)	1-100ms (file ops)	10-50ms (similarity)	N/A (full load)
Semantic Search	No (without extension)	No	Yes	No
Complexity	Medium	Low	Medium	Low
Best For	Structured state, multi-agent	Episodic logs, prototypes	Semantic recall	Quick prototypes

Database Backends

PostgreSQL with pgvector

Best for production systems needing both structured queries and semantic search.

ACID transactions for reliable state persistence
pgvector extension enables vector similarity search alongside relational queries
Auditable with full query logging
Scales well with connection pooling and read replicas

Redis

Best for high-speed session state and short-term memory caching.

In-memory speed: 1-10ms reads/writes
Supports vectors via RediSearch module
Pub/sub for real-time multi-agent coordination
Configure AOF persistence for durability

SQLite

Best for single-agent systems, prototypes, and embedded deployments.

Serverless, zero configuration
Full SQL support in a single file
Limited concurrency (file locking)
Scales to ~10K records comfortably

Implementation: Database-Backed Memory

import json
from datetime import datetime, timezone
from typing import Optional
 
import asyncpg
 
 
class AgentMemoryStore:
    """Persistent agent memory using PostgreSQL with pgvector."""
 
    def __init__(self, dsn: str):
        self.dsn = dsn
        self.pool = None
 
    async def initialize(self):
        self.pool = await asyncpg.create_pool(self.dsn)
        async with self.pool.acquire() as conn:
            await conn.execute("""
                CREATE EXTENSION IF NOT EXISTS vector;
                CREATE TABLE IF NOT EXISTS agent_memories (
                    id SERIAL PRIMARY KEY,
                    agent_id TEXT NOT NULL,
                    memory_type TEXT NOT NULL,
                    content TEXT NOT NULL,
                    embedding vector(1536),
                    metadata JSONB DEFAULT '{}',
                    created_at TIMESTAMPTZ DEFAULT NOW(),
                    accessed_at TIMESTAMPTZ DEFAULT NOW(),
                    relevance_score FLOAT DEFAULT 1.0
                );
                CREATE INDEX IF NOT EXISTS idx_memories_agent
                    ON agent_memories(agent_id, memory_type);
                CREATE INDEX IF NOT EXISTS idx_memories_embedding
                    ON agent_memories USING ivfflat (embedding vector_cosine_ops);
            """)
 
    async def store(
        self,
        agent_id: str,
        content: str,
        memory_type: str = "long_term",
        embedding: Optional[list[float]] = None,
        metadata: Optional[dict] = None,
    ):
        async with self.pool.acquire() as conn:
            await conn.execute(
                """INSERT INTO agent_memories
                   (agent_id, memory_type, content, embedding, metadata)
                   VALUES ($1, $2, $3, $4, $5)""",
                agent_id, memory_type, content, embedding,
                json.dumps(metadata or {}),
            )
 
    async def recall_semantic(
        self,
        agent_id: str,
        query_embedding: list[float],
        limit: int = 10,
        score_threshold: float = 0.7,
    ) -> list[dict]:
        async with self.pool.acquire() as conn:
            rows = await conn.fetch(
                """SELECT content, metadata,
                       1 - (embedding <=> $2) AS similarity
                   FROM agent_memories
                   WHERE agent_id = $1
                     AND 1 - (embedding <=> $2) > $3
                   ORDER BY similarity DESC
                   LIMIT $4""",
                agent_id, query_embedding, score_threshold, limit,
            )
            return [dict(r) for r in rows]
 
    async def recall_recent(
        self, agent_id: str, memory_type: str, limit: int = 20
    ) -> list[dict]:
        async with self.pool.acquire() as conn:
            rows = await conn.fetch(
                """SELECT content, metadata, created_at
                   FROM agent_memories
                   WHERE agent_id = $1 AND memory_type = $2
                   ORDER BY created_at DESC LIMIT $3""",
                agent_id, memory_type, limit,
            )
            return [dict(r) for r in rows]

File-Based Memory

The simplest approach: store memory as human-readable markdown files. Used by soul.py and Claude's MEMORY.md pattern.

Zero dependencies, human-readable, easy to debug
Files organized by date or topic
Agent reads memory files at session start, writes updates as it works
Limited concurrency and no semantic search without additional tooling

Structure: SOUL.md (identity/persona) + MEMORY.md (curated long-term facts) + memory/YYYY-MM-DD.md (daily session logs).

Vector Store Persistence

Persist embeddings for semantic retrieval using Qdrant, Chroma, pgvector, or similar.

Store facts extracted from conversations as embeddings
Retrieve relevant memories via similarity search at query time
Combine with structured storage for hybrid retrieval
Frameworks like Mem0 automate fact extraction from conversations and store in vector backends

Conversation Serialization

Dump full chat histories or agent states to JSON/YAML for reload.

Simple and portable – works with any storage backend
Bloats context without indexing or summarization
LangGraph uses checkpoint-based serialization: graph.get_state(checkpoint_id)
Best used with summarization to compress old conversations before storage

Memory Lifecycle Management

Temporal decay – Reduce relevance scores over time; forget stale memories
Consolidation – Periodically summarize episodic memories into long-term facts
Deduplication – Merge similar memories to prevent bloat
Capacity limits – Set maximum memory counts per agent and evict by relevance
Privacy – Implement deletion APIs for user data removal (GDPR compliance)

Frameworks

Mem0 – Automated memory extraction, vector + graph storage, multi-provider support
LangGraph – Checkpoint-based persistence with pluggable backends (SQLite, PostgreSQL, Redis)
soul.py – File-based memory with markdown files, zero dependencies
CrewAI / AutoGen – Message list serialization with configurable storage

AI Agent Knowledge Base

Sidebar

Table of Contents

Agent Memory Persistence

Overview

Types of Agent Memory

Memory Architecture

Approach Comparison

Database Backends

PostgreSQL with pgvector

Redis

SQLite

Implementation: Database-Backed Memory

File-Based Memory

Vector Store Persistence

Conversation Serialization

Memory Lifecycle Management

Frameworks

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Agent Memory Persistence

Overview

Types of Agent Memory

Memory Architecture

Approach Comparison

Database Backends

PostgreSQL with pgvector

Redis

SQLite

Implementation: Database-Backed Memory

File-Based Memory

Vector Store Persistence

Conversation Serialization

Memory Lifecycle Management

Frameworks

References

See Also

Page Tools