World models are internal representations that allow AI agents to simulate their environment, predict outcomes of actions, and plan strategies without directly interacting with the real world. They enable imagination-based planning where agents reason over future states before committing to actions.
A world model typically combines several components:
The agent can then “dream” – simulate trajectories within the learned world model to evaluate plans without costly real-world interaction.
The Dreamer family (V1, V2, V3) by Danijar Hafner et al. represents the most successful line of world-model-based RL agents.
DreamerV3 (Nature, 2025) achieves mastery across 150+ diverse tasks with a single configuration:
$$h_t = f_\theta(h_{t-1}, z_{t-1}, a_{t-1}), \quad z_t \sim q_\theta(z_t | h_t, o_t)$$
$$\mathcal{J}_{\text{actor}}(\psi) = \mathbb{E}_{\text{imagine}}\left[\sum_{t=0}^{H} \gamma^t \hat{r}_t\right]$$
Key achievement: DreamerV3 was the first algorithm to collect a diamond in Minecraft from scratch without human demonstrations – a long-horizon task requiring hundreds of sequential decisions across multiple subgoals.
# Simplified Dreamer imagination loop class WorldModel: def __init__(self, rssm, reward_head, decoder): self.rssm = rssm self.reward_head = reward_head self.decoder = decoder def imagine(self, initial_state, policy, horizon=15): """Generate imagined trajectory for planning.""" states, rewards = [initial_state], [] state = initial_state for t in range(horizon): action = policy(state) state = self.rssm.predict_next(state, action) reward = self.reward_head(state) states.append(state) rewards.append(reward) return states, rewards
Voyager (NVIDIA, 2023) takes a fundamentally different approach – using an LLM as the world model and planner for an embodied agent in Minecraft:
Unlike Dreamer's learned latent dynamics, Voyager leverages the LLM's pretrained world knowledge. It continuously discovers new skills without human intervention, demonstrating lifelong learning in an open-ended environment.
Recent research (2025-2026) demonstrates that LLMs can serve directly as environment simulators:
This decoupled approach enables training agents in simulated environments generated by LLMs, dramatically reducing the cost of environment interaction.
World models enable several planning strategies:
World models are trained by minimizing a composite loss over predicted observations, rewards, and latent state distributions:
$$\mathcal{L}_{\text{world}} = \mathbb{E}\left[\sum_{t=1}^{T}\left(\underbrace{\|o_t - \hat{o}_t\|^2}_{\text{reconstruction}} + \underbrace{(r_t - \hat{r}_t)^2}_{\text{reward prediction}} + \underbrace{D_{\text{KL}}(q(z_t|o_t) \| p(z_t|h_t))}_{\text{latent regularization}}\right)\right]$$
The KL term encourages the prior (imagination) distribution to match the posterior (observation-conditioned) distribution, ensuring that imagined rollouts remain faithful to real dynamics.
For embodied agents, world models bridge simulation and reality:
Advanced world agents maintain structured beliefs about environments and other agents:
Genie 3 (DeepMind, August 2025) represents a breakthrough in generative world models: