====== World Models for Agents ======
**World models** are internal computational simulators that allow AI agents to predict the next state of a dynamic system, represent underlying physics and causal relationships, and plan strategies without directly interacting with the real world(([[https://thesequence.substack.com/p/the-sequence-knowledge-842-everything|The Sequence - World Models]])). An agent world model serves as an internal simulator — enabling imagination-based planning where agents reason over future states before committing to actions, moving AI from being a narrator to a competent operator in physical environments(([[https://arxiv.org/abs/2506.22355|arXiv:2506.22355 - World Models for Agents]])). These integrated systems enable better decision-making through predictive simulation rather than direct reasoning improvement(([[https://cobusgreyling.substack.com/p/two-thirds-of-multi-agent-intelligence|Cobus Greyling, 2026]])). World models represent a critical frontier in **Physical AI**, which moves beyond low-bandwidth language abstractions to interact with the physical world by understanding spatial geometry and 4D reality(([[https://thesequence.substack.com/p/the-sequence-knowledge-842-everything|The Sequence - World Models]])).

===== World Models vs. Large Language Models =====
While **Large Language Models** function as brilliant narrators that excel at text prediction and abstraction, world models represent ground truth of physics and causality(([[https://thesequence.substack.com/p/the-sequence-knowledge-842-everything|The Sequence - World Models]])). LLMs operate as low-bandwidth abstractions of reality, effective for language generation but limited in their ability to understand spatial geometry and causal mechanics. World models, by contrast, function as competent operators that directly model physical phenomena and state transitions. This distinction marks a fundamental shift in AI development: from the era of pure token prediction toward the era of physical simulation and [[embodied_reasoning|embodied reasoning]] about dynamic environments.

===== Core Architecture =====
A world model typically combines several components:

  * **Transition model**: Predicts how environmental state changes given an action, $p(s_{t+1} | s_t, a_t)$
  * **Observation model**: Determines what the agent perceives in each state, $p(o_t | s_t)$
  * **Reward predictor**: Estimates expected reward for state-action pairs, $\hat{r}(s_t, a_t)$
  * **Latent state encoder**: Compresses high-dimensional observations into compact latent representations $z_t = \text{enc}(o_t)$

The agent can then "dream", simulate trajectories within the learned world model to evaluate plans without costly real-world interaction.

===== Dreamer Architecture =====
The **Dreamer** family (V1, V2, V3) by Danijar Hafner et al. represents the most successful line of world-model-based RL agents.

**DreamerV3**(([[https://arxiv.org/abs/2301.04104|arXiv:2301.04104 - DreamerV3: Mastering Diverse Domains]])) (Nature, 2025) achieves mastery across 150+ diverse tasks with a single configuration:

  * Learns a Recurrent State-Space Model (RSSM) with deterministic state $h_t$ and stochastic state $z_t$:

$$h_t = f_\theta(h_{t-1}, z_{t-1}, a_{t-1}), \quad z_t \sim q_\theta(z_t | h_t, o_t)$$

  * Imagines future trajectories in [[latent_space|latent space]] by rolling out the prior: $\hat{z}_t \sim p_\theta(\hat{z}_t | h_t)$
  * Trains actor and critic entirely within imagined rollouts, optimizing:

$$\mathcal{J}_{\text{actor}}(\psi) = \mathbb{E}_{\text{imagine}}\left[\sum_{t=0}^{H} \[[gamma|gamma]]^t \hat{r}_t\right]$$

  * Uses symlog normalization and percentile-based scaling for robustness

Key achievement: DreamerV3 was the first algorithm to collect a diamond in Minecraft from scratch without human demonstrations, a long-horizon task requiring hundreds of sequential decisions across multiple subgoals.

<code python>
Simplified Dreamer imagination loop
class WorldModel:
    def __init__(self, rssm, reward_head, decoder):
        self.rssm = rssm
        self.reward_head = reward_head
        self.decoder = decoder
    
    def imagine(self, initial_state, policy, horizon=15):
        """Generate imagined trajectory for planning."""
        states, rewards = initial_state, []
        state = initial_state
        for t in range(horizon):
            action = policy(state)
            state = self.rssm.predict_next(state, action)
            reward = self.reward_head(state)
            states.append(state)
            rewards.append(reward)
        return states, rewards
</code>

===== Voyager: LLM-Powered World Knowledge =====
**Voyager**(([[https://arxiv.org/abs/2305.16291|arXiv:2305.16291 - Voyager: Open-Ended Embodied Agent with LLMs]])) ([[nvidia|NVIDIA]], 2023) takes a fundamentally different approach, using an LLM as the world model and planner for an embodied agent in Minecraft:

  * **Automatic curriculum**: LLM proposes progressively harder exploration goals
  * **Skill library**: Stores executable code snippets for complex behaviors, retrieved and composed for novel situations
  * **Environment feedback**: Execution results, game state, and error messages feed back to the LLM

Unlike Dreamer's learned latent dynamics, Voyager leverages the LLM's pretrained world knowledge. It continuously discovers new skills without human intervention, demonstrating lifelong learning in an open-ended environment.

===== LLMs as World Models =====
Recent research (2025-2026) demonstrates that LLMs can serve directly as environment simulators:

  * Fine-tuned **Qwen2.5-7B** and **Llama-3.1-8B** achieved >99% accuracy predicting state transitions in ALFWorld and ~98.6% in SciWorld
  * Even without fine-tuning, [[claude|Claude]] Sonnet achieved 77% accuracy with just 3 examples
  * Architecture: One "World Model LLM" simulates the environment while a separate "Agent LLM" plans and acts

This decoupled approach enables training agents in simulated environments generated by LLMs, dramatically reducing the cost of environment interaction.

===== Foundation Models and World Engines =====
**[[nvidia|NVIDIA]] Cosmos** represents a new class of world models: foundation models designed to serve as physics engines for large-scale synthetic data generation(([[https://thesequence.substack.com/p/the-sequence-knowledge-842-everything|The Sequence - World Models]])). Cosmos compresses spatiotemporal reality into tokenized representations that enable the model to process, predict, and generate complex physical phenomena. Such world foundation models are central to the industry's broader shift toward grounding machine intelligence in the physical world, enabling large-scale generation of synthetic training environments without explicit physics simulators.

**Genie 3** ([[google_deepmind|Google DeepMind]], August 2025) represents a major breakthrough in generative world models by producing fully interactive, playable environments from a single image(([[https://thesequence.substack.com/p/the-sequence-knowledge-842-everything|The Sequence - World Models]])). Unlike purely predictive world models, Genie 3 generates action-controllable 3D environments and simulates realistic physics within them, enabling agents to train in dynamically generated worlds. This positions Genie 3 as a milestone toward world-model-based AGI, demonstrating that world models can be constructive simulators rather than passive observers.

===== Spatial-Temporal Scene Reconstruction =====
**D4RT** (DeepMind) represents an architectural leap in world model design by reconstructing dynamic 4D environments through unified perception and tracking(([[https://thesequence.substack.com/p/the-sequence-knowledge-842-everything|The Sequence - World Models]])). Rather than predicting future states frame-by-frame, D4RT provides a highly parallelized, queryable interface for understanding how environments evolve across spatial and temporal dimensions. This approach advances spatiotemporal reasoning by enabling agents to query environment states at arbitrary spatial locations and time steps, supporting more sophisticated planning and scene understanding than purely predictive models alone.

===== 3D World Models and Production-Ready Assets =====
A new generation of [[world_models|world models]] extends beyond video prediction and environment simulation to generate editable 3D scenes with production-ready properties. Models such as **HYWorld 2.0**(([[https://www.latent.space/p/ainews-humanitys-last-gasp|Latent Space - AI News Article]])) focus on engine-readiness by generating assets with proper topology, UV mapping, and rigging. These capabilities allow AI-generated 3D environments and objects to be directly integrated into production pipelines for gaming, virtual reality, and digital content creation without requiring manual asset refinement. This represents a significant practical advance in [[world_models|world models]], bridging the gap between generative simulation and real-time interactive applications where geometric and material properties must meet strict engineering standards.

===== Imagination-Based Planning =====
[[world_models|World models]] enable several planning strategies:

  * **Forward rollout**: Simulate action sequences, select the one with highest cumulative predicted reward: $a_{0:H}^* = \arg\max_{a_{0:H}} \sum_{t=0}^{H} \[[gamma|gamma]]^t \hat{r}(s_t, a_t)$
  * **Model Predictive Control (MPC)**: Re-plan at every step using the latest state estimate
  * **Tree search**: Explore branching futures (MCTS-style) within the world model
  * **Latent planning**: Optimize action sequences directly in [[latent_space|latent space]] via gradient descent on $\nabla_{a_{0:H}} \sum_t \hat{r}(s_t, a_t)$

===== Prediction Loss =====
[[world_models|World models]] are trained by minimizing a composite loss over predicted observations, rewards, and latent state distributions:

$$\mathcal{L}_{\text{world}} = \mathbb{E}\left[\sum_{t=1}^{T}\left(\underbrace{\|o_t - \hat{o}_t\|^2}_{\text{reconstruction}} + \underbrace{(r_t - \hat{r}_t)^2}_{\text{reward prediction}} + \underbrace{D_{\text{KL}}(q(z_t|o_t) \| p(z_t|h_t))}_{\text{latent regularization}}\right)\right]$$

The KL term encourages the prior (imagination) distribution to match the posterior (observation-conditioned) distribution, ensuring that imagined rollouts remain faithful to real dynamics.

===== Sim-to-Real Transfer =====
For embodied agents, [[world_models|world models]] bridge simulation and reality through a continuous learning cycle:

  * **Simulation-based training**: Agents practice policies within learned [[world_models|world models]] or physics simulators before deployment to physical hardware
  * **Data bottleneck solution**: Millions of failures and iterations can occur safely in simulation, addressing the critical constraint of costly real-world robot interaction
  * **Physics grounding**: World models provide physics-grounded training environments that support embodied agent development across [[factory|factory]] automation, warehouse robotics, and autonomous vehicles
  * **Domain randomization**: Transfer policies to real robots by training on diverse simulated dynamics
  * **Reality gap modeling**: Continuous feedback loops allow agents to adapt learned behaviors to account for systematic differences between simulated and real environments
  * **[[nvidia|NVIDIA]] Omniverse and similar platforms**: Generate diverse synthetic training data at scale using world models
  * **Iterative refinement**: Embodied agents can incrementally improve performance by alternating between simulation training and real-world deployment cycles

===== Collaborative and Multi-Agent World Models =====
Advanced world agents maintain structured beliefs about environments and other agents:

  * **Collaborative Belief Worlds (CBW)**: Each agent tracks physical facts (zero-order beliefs) and models of collaborators' mental states (first-order beliefs)
  * **Aspective [[agentic_ai|Agentic AI]]**: Partitions agents into information-based aspects where each observes only "their world"
  * Enables intent-aware communication under partial observability

===== See Also =====

  * [[world_models|World Models]]
  * [[agent_simulation_environments|Agent Simulation Environments]]
  * [[synthetic_long_horizon_data|Synthetic Long-Horizon Computer-Use Worlds]]
  * [[ai_agents|AI Agents]]
  * [[digital_ecosystem_simulation|Digital Ecosystem Simulation]]

===== References =====