Effective memory management is essential for Large Language Model (LLM) agents to maintain context, recall past interactions, and enhance performance over time. As of 2025, the field has evolved from simple conversation buffers to sophisticated multi-tier memory systems inspired by human cognitive architecture. This article examines the memory landscape for LLM agents, covering memory types, dedicated memory frameworks, and general-purpose agent libraries with memory capabilities.
Agent memory systems draw from cognitive science, organizing information into complementary types:
Sensory Memory is the initial processing of raw multimodal input (vision, text, audio) through encoder modules like Vision Transformers, CLIP, and Whisper. It acts as a high-bandwidth buffer where attention mechanisms filter what gets promoted to working memory.
Short-Term/Working Memory corresponds to the LLM's context window (128K-1M+ tokens in 2025). It holds the current conversation, retrieved facts, and reasoning traces. KV caches, chain-of-thought scratchpads, and in-context learning all operate within this tier.
Long-Term Memory uses external storage (vector databases, knowledge graphs, structured stores) to persist information across sessions. This tier has effectively unlimited capacity but requires retrieval mechanisms to access.
Explicit/Declarative Memory stores facts, events, and concepts that can be directly queried: user preferences, domain knowledge, interaction history. Implemented via vector stores and knowledge graphs.
Implicit/Procedural Memory encodes learned skills and behaviors in model weights through pretraining and fine-tuning. This includes tool-use patterns, reasoning procedures, and response formatting habits.
These types are organized in hierarchical architectures where information flows between tiers through consolidation, eviction, and retrieval operations. See memory augmentation strategies for techniques that enhance these systems.
import numpy as np from openai import OpenAI from dataclasses import dataclass, field from datetime import datetime client = OpenAI() @dataclass class MemoryEntry: text: str embedding: np.ndarray timestamp: datetime = field(default_factory=datetime.now) metadata: dict = field(default_factory=dict) class AgentMemory: """Simple memory store with embedding-based semantic retrieval.""" def __init__(self, model: str = "text-embedding-3-small"): self.model = model self.entries: list[MemoryEntry] = [] def _embed(self, text: str) -> np.ndarray: resp = client.embeddings.create(input=text, model=self.model) return np.array(resp.data[0].embedding, dtype="float32") def store(self, text: str, metadata: dict = None): embedding = self._embed(text) self.entries.append(MemoryEntry(text=text, embedding=embedding, metadata=metadata or {})) def retrieve(self, query: str, top_k: int = 3) -> list[str]: """Retrieve the top_k most relevant memories by cosine similarity.""" query_emb = self._embed(query) scores = [] for entry in self.entries: sim = np.dot(query_emb, entry.embedding) / ( np.linalg.norm(query_emb) * np.linalg.norm(entry.embedding) ) scores.append((sim, entry)) scores.sort(key=lambda x: x[0], reverse=True) return [entry.text for _, entry in scores[:top_k]] # Usage: store and retrieve agent memories memory = AgentMemory() memory.store("User prefers Python over JavaScript for backend work.") memory.store("Last project used FastAPI with PostgreSQL.") memory.store("User is interested in vector databases and HNSW.") relevant = memory.retrieve("What tech stack does the user like?") print("Retrieved memories:", relevant)
A new category of tools has emerged focused specifically on providing persistent memory for agents:
These frameworks include memory as part of broader agent capabilities:
The retrieval layer underlying agent memory relies on efficient similarity search: