Generative Agents are computational software agents introduced by Park et al. (Stanford, 2023) that simulate believable human behavior by extending large language models with a three-tier cognitive architecture: a memory stream for recording experiences, a reflection mechanism for synthesizing higher-level insights, and a planning system for goal-directed behavior. The landmark paper demonstrated 25 agents living in a simulated town called Smallville, producing emergent social behaviors without explicit scripting.1)2)3)
Smallville is an interactive sandbox environment inspired by The Sims, featuring:
The simulation runs continuously, with agents making autonomous decisions about where to go, who to talk to, and what to do, driven entirely by their memory, reflections, and plans.
The cognitive architecture comprises three interconnected components operating in a continuous loop:
A chronological record of all observations, everything the agent perceives, says, hears, or does, stored as natural language entries with timestamps. Each entry captures a single atomic event:
Periodically (triggered when accumulated importance scores exceed a threshold), the agent synthesizes memories into higher-level abstract insights:
Reflections are themselves stored back in the memory stream, enabling recursive abstraction, reflections on reflections.
Generates behavior in a hierarchical decomposition:
Re-planning triggers when new observations create opportunities or conflicts (e.g., encountering a friend, discovering a relevant event).
When an agent needs to act, it retrieves relevant memories using a weighted scoring function combining three factors:
$$\text{score}(m) = \alpha \cdot \text{recency}(m) + \beta \cdot \text{importance}(m) + \[[gamma|gamma]] \cdot \text{relevance}(m, q)$$
where:
Top-$k$ scored memories are retrieved and included in the LLM prompt for action generation.
import numpy as np from dataclasses import dataclass from typing import List @dataclass class Memory: description: str timestamp: float importance: int # 1-10, LLM-assigned embedding: np.ndarray # Semantic embedding vector last_accessed: float = 0.0 class GenerativeAgent: def __init__(self, name, identity, llm, embedding_model): self.name = name self.identity = identity self.llm = llm self.embed = embedding_model self.memory_stream: List[Memory] = [] self.plan: List[str] = [] self.importance_sum = 0.0 self.reflection_threshold = 150 def observe(self, observation, current_time): # Add a new observation to the memory stream importance = self._score_importance(observation) memory = Memory( description=observation, timestamp=current_time, importance=importance, embedding=self.embed(observation), last_accessed=current_time ) self.memory_stream.append(memory) self.importance_sum += importance if self.importance_sum >= self.reflection_threshold: self._reflect(current_time) self.importance_sum = 0 def retrieve(self, query, current_time, k=10): # Retrieve top-k relevant memories using tripartite scoring scores = [] query_emb = self.embed(query) for mem in self.memory_stream: recency = np.exp(-0.995 * (current_time - mem.last_accessed)) relevance = np.dot(mem.embedding, query_emb) score = recency + mem.importance / 10.0 + relevance scores.append scores.sort(key=lambda x: -x[0]) top_memories = [m for _, m in scores[:k]] for m in top_memories: m.last_accessed = current_time return top_memories def act(self, context, current_time): # Decide what to do based on retrieved memories and plan memories = self.retrieve(context, current_time) mem_text = "\n".join(m.description for m in memories) prompt = ( f"Name: {self.name}\nIdentity: {self.identity}\n" f"Current plan: {self.plan}\n" f"Relevant memories:\n{mem_text}\n" f"Current context: {context}\n" f"What does {self.name} do next?" ) return self.llm.generate(prompt) def _reflect(self, current_time): # Synthesize recent memories into higher-level insights recent = self.memory_stream[-100:] mem_text = "\n".join(m.description for m in recent) prompt = ( f"Given these recent observations:\n{mem_text}\n" f"What 3 high-level insights can you infer?" ) insights = self.llm.generate(prompt) for insight in insights.split("\n"): if insight.strip(): self.observe(f"[Reflection] {insight.strip()}", current_time) def _score_importance(self, observation): prompt = ( f"Rate the importance of this event (1-10):\n" f"{observation}\nScore:" ) return int(self.llm.generate(prompt).strip())
Over two simulated days with minimal human intervention, the 25 agents produced remarkable emergent phenomena:
None of these behaviors were scripted, they emerged purely from the interaction of individual agent architectures with each other and the environment.
The evaluation combined technical ablations with human assessments: