Agents operating in partially observable environments need structured memory to reason about unseen states and plan effective actions. AriGraph (IJCAI 2025) introduces a knowledge graph world model that LLM agents dynamically construct during exploration, integrating semantic and episodic memories into a queryable graph structure that dramatically improves reasoning, planning, and decision-making.
Traditional approaches to agent memory fall short in complex environments:
AriGraph solves these by maintaining a dynamically growing knowledge graph $G = (V, E)$ where nodes $V$ represent entities (objects, locations, characters) and edges $E$ represent relationships (spatial, functional, causal) discovered through exploration.
AriGraph integrates two complementary memory types, inspired by cognitive science:
Episodic Memory: Stores specific events and observations from agent interactions. Each exploration step generates episodic entries that capture what the agent saw, did, and experienced at a particular moment.
Semantic Memory: Accumulates general knowledge derived from episodic experiences. Over time, repeated observations about object properties, spatial layouts, and functional relationships are abstracted into stable semantic knowledge.
The relationship between the two can be expressed as:
$$M_{\text{semantic}} = \text{Abstract}(\{m_1, m_2, \ldots, m_t\}_{\text{episodic}})$$
Semantic memory builds on episodic memory, creating a structured base for associative recall that supports long-term knowledge accumulation beyond what unstructured methods can achieve.
The Ariadne agent processes local observations from the environment and incrementally builds the graph:
Growth Phase: During initial exploration, new observations add nodes and edges rapidly as the agent discovers new locations, objects, and relationships.
Stabilization Phase: As the agent becomes familiar with the environment, graph growth flattens – an indicator of effective generalization rather than redundant storage.
Cleaning Phase: Pruning mechanisms remove redundant or contradictory entries to maintain graph quality.
The graph update function at each timestep $t$ is:
$$G_{t+1} = \text{Clean}(G_t \cup \text{Extract}(o_t))$$
where $o_t$ is the observation at time $t$ and Extract identifies new entities and relations.
AriGraph serves as the core memory component in a cognitive loop:
This enables efficient multi-hop inference – tracing paths between locations and objects for planning without needing the full observation history.
class AriGraph: def __init__(self): self.nodes = {} self.edges = [] self.episodic_memory = [] self.semantic_memory = {} def update(self, observation, action, timestep): self.episodic_memory.append({ "observation": observation, "action": action, "timestep": timestep }) entities = self.extract_entities(observation) relations = self.extract_relations(observation) for entity in entities: if entity.id not in self.nodes: self.nodes[entity.id] = entity else: self.nodes[entity.id].update_properties(entity) for relation in relations: self.edges.append(relation) self.abstract_semantic_knowledge() self.clean_redundancies() def retrieve_relevant_subgraph(self, current_state, goal): relevant_nodes = self.find_related_nodes(current_state, goal) subgraph = self.extract_subgraph(relevant_nodes, max_hops=3) return subgraph def plan_with_graph(self, current_state, goal, llm): subgraph = self.retrieve_relevant_subgraph(current_state, goal) path = self.find_path(current_state.location, goal.location) prompt = self.build_planning_prompt( current=current_state, goal=goal, knowledge=subgraph, path=path ) return llm.generate_plan(prompt) def abstract_semantic_knowledge(self): entity_observations = {} for episode in self.episodic_memory: for entity in self.extract_entities(episode["observation"]): entity_observations.setdefault(entity.id, []).append(entity) for eid, observations in entity_observations.items(): if len(observations) >= 3: self.semantic_memory[eid] = self.generalize(observations)
AriGraph is evaluated on interactive text games (TextWorld environments) and static multi-hop QA:
Text Games: AriGraph agents outperform both memory baselines (full history, summarization) and RL-based methods in complex games requiring spatial reasoning and object manipulation. Graph quality improves over time as the agent explores.
Multi-Hop QA: Achieves 68.6% accuracy on GPT-4o-mini, competitive with dedicated knowledge graph methods while being fully dynamic (no pre-built knowledge base required).
Key findings:
AriGraph's design draws explicitly from theories of human memory: