Table of Contents

Knowledge Graph World Models: AriGraph

Agents operating in partially observable environments need structured memory to reason about unseen states and plan effective actions. AriGraph (IJCAI 2025) introduces a knowledge graph world model that LLM agents dynamically construct during exploration, integrating semantic and episodic memories into a queryable graph structure that dramatically improves reasoning, planning, and decision-making.

The Memory Problem in Agent Systems

Traditional approaches to agent memory fall short in complex environments:

AriGraph solves these by maintaining a dynamically growing knowledge graph $G = (V, E)$ where nodes $V$ represent entities (objects, locations, characters) and edges $E$ represent relationships (spatial, functional, causal) discovered through exploration.

Dual Memory Architecture

AriGraph integrates two complementary memory types, inspired by cognitive science:

Episodic Memory: Stores specific events and observations from agent interactions. Each exploration step generates episodic entries that capture what the agent saw, did, and experienced at a particular moment.

Semantic Memory: Accumulates general knowledge derived from episodic experiences. Over time, repeated observations about object properties, spatial layouts, and functional relationships are abstracted into stable semantic knowledge.

The relationship between the two can be expressed as:

$$M_{\text{semantic}} = \text{Abstract}(\{m_1, m_2, \ldots, m_t\}_{\text{episodic}})$$

Semantic memory builds on episodic memory, creating a structured base for associative recall that supports long-term knowledge accumulation beyond what unstructured methods can achieve.

Graph Construction and Update

The Ariadne agent processes local observations from the environment and incrementally builds the graph:

Growth Phase: During initial exploration, new observations add nodes and edges rapidly as the agent discovers new locations, objects, and relationships.

Stabilization Phase: As the agent becomes familiar with the environment, graph growth flattens – an indicator of effective generalization rather than redundant storage.

Cleaning Phase: Pruning mechanisms remove redundant or contradictory entries to maintain graph quality.

The graph update function at each timestep $t$ is:

$$G_{t+1} = \text{Clean}(G_t \cup \text{Extract}(o_t))$$

where $o_t$ is the observation at time $t$ and Extract identifies new entities and relations.

Retrieval-Planning-Decision Loop

AriGraph serves as the core memory component in a cognitive loop:

  1. Retrieval: Given the current state, relevant subgraphs are extracted from AriGraph
  2. Planning: The LLM uses retrieved knowledge to generate action plans, leveraging multi-hop graph traversal for spatial reasoning
  3. Decision: The agent selects and executes an action based on the plan
  4. Update: New observations update the graph, closing the loop

This enables efficient multi-hop inference – tracing paths between locations and objects for planning without needing the full observation history.

Code Example: Knowledge Graph World Model

class AriGraph:
    def __init__(self):
        self.nodes = {}
        self.edges = []
        self.episodic_memory = []
        self.semantic_memory = {}
 
    def update(self, observation, action, timestep):
        self.episodic_memory.append({
            "observation": observation,
            "action": action,
            "timestep": timestep
        })
        entities = self.extract_entities(observation)
        relations = self.extract_relations(observation)
        for entity in entities:
            if entity.id not in self.nodes:
                self.nodes[entity.id] = entity
            else:
                self.nodes[entity.id].update_properties(entity)
        for relation in relations:
            self.edges.append(relation)
        self.abstract_semantic_knowledge()
        self.clean_redundancies()
 
    def retrieve_relevant_subgraph(self, current_state, goal):
        relevant_nodes = self.find_related_nodes(current_state, goal)
        subgraph = self.extract_subgraph(relevant_nodes, max_hops=3)
        return subgraph
 
    def plan_with_graph(self, current_state, goal, llm):
        subgraph = self.retrieve_relevant_subgraph(current_state, goal)
        path = self.find_path(current_state.location, goal.location)
        prompt = self.build_planning_prompt(
            current=current_state,
            goal=goal,
            knowledge=subgraph,
            path=path
        )
        return llm.generate_plan(prompt)
 
    def abstract_semantic_knowledge(self):
        entity_observations = {}
        for episode in self.episodic_memory:
            for entity in self.extract_entities(episode["observation"]):
                entity_observations.setdefault(entity.id, []).append(entity)
        for eid, observations in entity_observations.items():
            if len(observations) >= 3:
                self.semantic_memory[eid] = self.generalize(observations)

Evaluation Results

AriGraph is evaluated on interactive text games (TextWorld environments) and static multi-hop QA:

Text Games: AriGraph agents outperform both memory baselines (full history, summarization) and RL-based methods in complex games requiring spatial reasoning and object manipulation. Graph quality improves over time as the agent explores.

Multi-Hop QA: Achieves 68.6% accuracy on GPT-4o-mini, competitive with dedicated knowledge graph methods while being fully dynamic (no pre-built knowledge base required).

Key findings:

Agent Cognitive Architecture Diagram

flowchart TD A[Environment] --> B[Observation] B --> C[Entity & Relation Extraction] C --> D[AriGraph Update] D --> E[Episodic Memory] D --> F[Semantic Memory] E --> F G[Current State + Goal] --> H[Subgraph Retrieval] F --> H E --> H H --> I[Multi-Hop Reasoning] I --> J[LLM Planning] J --> K[Action Selection] K --> L[Environment Step] L --> B

Connections to Cognitive Science

AriGraph's design draws explicitly from theories of human memory:

References

See Also