====== Voyager: Open-Ended Embodied Agent with LLMs ======
Voyager is an LLM-powered embodied agent by Wang et al. (2023) that achieves **lifelong learning** in Minecraft through three interconnected components: an automatic curriculum, an ever-growing skill library, and an iterative code generation loop. Without any model fine-tuning or gradient updates, Voyager uses GPT-4 as a blackbox to explore, acquire skills, and compose increasingly complex behaviors — obtaining 3.3x more unique items, traveling 2.3x longer distances, and unlocking technology tree milestones up to 15.3x faster than prior approaches.
graph TD
AC[Automatic Curriculum] --> CG[Code Generation]
CG --> EX[Execute in Minecraft]
EX --> CHECK{Success?}
CHECK -->|No| CG
CHECK -->|Yes| SL[Store in Skill Library]
SL --> AC
===== Three Core Components =====
Voyager's architecture integrates three modules that together enable open-ended exploration:
=== 1. Automatic Curriculum ===
The curriculum module proposes exploration goals by analyzing what the agent has not yet encountered:
* Uses **inverse frequency scoring** across Minecraft wiki categories to identify unfamiliar areas
* Prioritizes novel knowledge — if the agent has crafted many wooden tools but never explored caves, it suggests mining
* Creates a bottom-up discovery process without predefined task sequences
* Adapts dynamically based on the agent's current inventory, surroundings, and skill history
=== 2. Skill Library ===
A persistent, ever-growing repository of executable code skills:
* Each skill is a JavaScript function executable via the Mineflayer API
* Skills are indexed by description embeddings for semantic retrieval
* Tagged with metadata: description, items used, usage count, recency
* Enables **compositional behavior** — complex skills chain simpler ones (e.g., "build shelter" calls "chop wood" + "craft planks" + "place blocks")
* Prevents catastrophic forgetting by persisting across sessions
* Transfers to new Minecraft worlds for zero-shot generalization
=== 3. Iterative Code Generation ===
A feedback-driven loop for synthesizing new skills:
- Retrieve top-k similar skills from the library via embedding similarity
- Provide GPT-4 with current inventory, nearby blocks, and retrieved skill examples
- GPT-4 generates executable JavaScript code for the proposed action
- Execute via Mineflayer; capture success/failure and environment state changes
- On failure, feed execution errors and self-verification feedback back to GPT-4
- Iterate up to 5-10 attempts until success
- Verified skills are added to the skill library with metadata
===== Code Example =====
# Simplified Voyager-style skill generation and retrieval loop
import openai
import numpy as np
from typing import List, Dict
class SkillLibrary:
def __init__(self):
self.skills: Dict[str, dict] = {}
self.embeddings: Dict[str, np.ndarray] = {}
def add_skill(self, name: str, code: str, description: str):
self.skills[name] = {"code": code, "description": description}
self.embeddings[name] = get_embedding(description)
def retrieve(self, query: str, top_k: int = 5) -> List[dict]:
query_emb = get_embedding(query)
scores = {
name: cosine_similarity(query_emb, emb)
for name, emb in self.embeddings.items()
}
top_names = sorted(scores, key=scores.get, reverse=True)[:top_k]
return [self.skills[n] for n in top_names]
def iterative_code_generation(goal: str, library: SkillLibrary,
env_state: dict, max_retries: int = 5):
similar_skills = library.retrieve(goal)
context = format_context(env_state, similar_skills)
for attempt in range(max_retries):
code = gpt4_generate(goal, context)
success, feedback = execute_in_minecraft(code)
if success:
library.add_skill(goal, code, description=goal)
return code
context += f"\nAttempt {attempt+1} failed: {feedback}"
return None
===== Benchmark Results =====
Evaluated in the MineDojo framework against ReAct, Reflexion, and AutoGPT baselines:
^ Metric ^ Voyager ^ Best Baseline ^ Improvement ^
| Unique items obtained | 63 | 19 (AutoGPT) | 3.3x |
| Travel distance | 2300+ blocks | 1000 blocks | 2.3x |
| Wooden tools (time) | 2 min | 30.6 min (ReAct) | 15.3x faster |
| Stone tools (time) | 5 min | 42.5 min | 8.5x faster |
| Iron tools (time) | 15 min | 96 min | 6.4x faster |
| Diamond tools | Achieved | Not achieved | Unique to Voyager |
Voyager is the only agent to unlock the complete Minecraft technology tree through to diamond-level tools.
===== Lifelong Learning =====
The lifelong learning paradigm enables continuous improvement:
\mathcal{S}_{t+1} = \mathcal{S}_t \cup \{s_{new}\} \text{ where } s_{new} = \text{verify}(\text{generate}(g_t, \mathcal{S}_t, o_t))
where \mathcal{S}_t is the skill library at time t, g_t is the curriculum-proposed goal, and o_t is the environment observation. The library grows monotonically, and skills compound — enabling behaviors impossible through any single generation step.
===== References =====
* [[https://arxiv.org/abs/2305.16291|Wang et al. "Voyager: An Open-Ended Embodied Agent with Large Language Models" (arXiv:2305.16291)]]
* [[https://voyager.minedojo.org/|Voyager Project Website]]
* [[https://github.com/MineDojo/Voyager|Voyager GitHub Repository]]
* [[https://arxiv.org/abs/2206.08853|Fan et al. "MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge"]]
===== See Also =====
* [[gorilla|Gorilla — LLM trained for accurate API calling]]
* [[metagpt|MetaGPT — Multi-agent software development framework]]
* [[self_play_agents|Self-Play Agents — Self-improvement through agent interaction]]