Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Safety & Governance
Evaluation
Research
Development
Meta
Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Safety & Governance
Evaluation
Research
Development
Meta
World models are internal representations that allow AI agents to simulate their environment, predict outcomes of actions, and plan strategies without directly interacting with the real world. They enable imagination-based planning where agents reason over future states before committing to actions.
A world model typically combines several components:
The agent can then “dream” – simulate trajectories within the learned world model to evaluate plans without costly real-world interaction.
The Dreamer family (V1, V2, V3) by Danijar Hafner et al. represents the most successful line of world-model-based RL agents.
DreamerV3 (Nature, 2025) achieves mastery across 150+ diverse tasks with a single configuration:
$$h_t = f_\theta(h_{t-1}, z_{t-1}, a_{t-1}), \quad z_t \sim q_\theta(z_t | h_t, o_t)$$
$$\mathcal{J}_{\text{actor}}(\psi) = \mathbb{E}_{\text{imagine}}\left[\sum_{t=0}^{H} \gamma^t \hat{r}_t\right]$$
Key achievement: DreamerV3 was the first algorithm to collect a diamond in Minecraft from scratch without human demonstrations – a long-horizon task requiring hundreds of sequential decisions across multiple subgoals.
# Simplified Dreamer imagination loop class WorldModel: def __init__(self, rssm, reward_head, decoder): self.rssm = rssm self.reward_head = reward_head self.decoder = decoder def imagine(self, initial_state, policy, horizon=15): """Generate imagined trajectory for planning.""" states, rewards = [initial_state], [] state = initial_state for t in range(horizon): action = policy(state) state = self.rssm.predict_next(state, action) reward = self.reward_head(state) states.append(state) rewards.append(reward) return states, rewards
Voyager (NVIDIA, 2023) takes a fundamentally different approach – using an LLM as the world model and planner for an embodied agent in Minecraft:
Unlike Dreamer's learned latent dynamics, Voyager leverages the LLM's pretrained world knowledge. It continuously discovers new skills without human intervention, demonstrating lifelong learning in an open-ended environment.
Recent research (2025-2026) demonstrates that LLMs can serve directly as environment simulators:
This decoupled approach enables training agents in simulated environments generated by LLMs, dramatically reducing the cost of environment interaction.
World models enable several planning strategies:
World models are trained by minimizing a composite loss over predicted observations, rewards, and latent state distributions:
$$\mathcal{L}_{\text{world}} = \mathbb{E}\left[\sum_{t=1}^{T}\left(\underbrace{\|o_t - \hat{o}_t\|^2}_{\text{reconstruction}} + \underbrace{(r_t - \hat{r}_t)^2}_{\text{reward prediction}} + \underbrace{D_{\text{KL}}(q(z_t|o_t) \| p(z_t|h_t))}_{\text{latent regularization}}\right)\right]$$
The KL term encourages the prior (imagination) distribution to match the posterior (observation-conditioned) distribution, ensuring that imagined rollouts remain faithful to real dynamics.
For embodied agents, world models bridge simulation and reality:
Advanced world agents maintain structured beliefs about environments and other agents:
Genie 3 (DeepMind, August 2025) represents a breakthrough in generative world models: