Voyager: Open-Ended Embodied Agent with LLMs

Voyager is an LLM-powered embodied agent by Wang et al. (2023) that achieves lifelong learning in Minecraft through three interconnected components: an automatic curriculum, an ever-growing skill library, and an iterative code generation loop. Without any model fine-tuning or gradient updates, Voyager uses GPT-4 as a blackbox to explore, acquire skills, and compose increasingly complex behaviors — obtaining 3.3x more unique items, traveling 2.3x longer distances, and unlocking technology tree milestones up to 15.3x faster than prior approaches.

Three Core Components

Voyager's architecture integrates three modules that together enable open-ended exploration:

1. Automatic Curriculum

The curriculum module proposes exploration goals by analyzing what the agent has not yet encountered:

Uses inverse frequency scoring across Minecraft wiki categories to identify unfamiliar areas
Prioritizes novel knowledge — if the agent has crafted many wooden tools but never explored caves, it suggests mining
Creates a bottom-up discovery process without predefined task sequences
Adapts dynamically based on the agent's current inventory, surroundings, and skill history

2. Skill Library

A persistent, ever-growing repository of executable code skills:

Each skill is a JavaScript function executable via the Mineflayer API
Skills are indexed by description embeddings for semantic retrieval
Tagged with metadata: description, items used, usage count, recency
Enables compositional behavior — complex skills chain simpler ones (e.g., “build shelter” calls “chop wood” + “craft planks” + “place blocks”)
Prevents catastrophic forgetting by persisting across sessions
Transfers to new Minecraft worlds for zero-shot generalization

3. Iterative Code Generation

A feedback-driven loop for synthesizing new skills:

Retrieve top-k similar skills from the library via embedding similarity
Provide GPT-4 with current inventory, nearby blocks, and retrieved skill examples
GPT-4 generates executable JavaScript code for the proposed action
Execute via Mineflayer; capture success/failure and environment state changes
On failure, feed execution errors and self-verification feedback back to GPT-4
Iterate up to 5-10 attempts until success
Verified skills are added to the skill library with metadata

Code Example

# Simplified Voyager-style skill generation and retrieval loop
import openai
import numpy as np
from typing import List, Dict
 
class SkillLibrary:
    def __init__(self):
        self.skills: Dict[str, dict] = {}
        self.embeddings: Dict[str, np.ndarray] = {}
 
    def add_skill(self, name: str, code: str, description: str):
        self.skills[name] = {"code": code, "description": description}
        self.embeddings[name] = get_embedding(description)
 
    def retrieve(self, query: str, top_k: int = 5) -> List[dict]:
        query_emb = get_embedding(query)
        scores = {
            name: cosine_similarity(query_emb, emb)
            for name, emb in self.embeddings.items()
        }
        top_names = sorted(scores, key=scores.get, reverse=True)[:top_k]
        return [self.skills[n] for n in top_names]
 
def iterative_code_generation(goal: str, library: SkillLibrary,
                               env_state: dict, max_retries: int = 5):
    similar_skills = library.retrieve(goal)
    context = format_context(env_state, similar_skills)
 
    for attempt in range(max_retries):
        code = gpt4_generate(goal, context)
        success, feedback = execute_in_minecraft(code)
        if success:
            library.add_skill(goal, code, description=goal)
            return code
        context += f"\nAttempt {attempt+1} failed: {feedback}"
    return None

Benchmark Results

Evaluated in the MineDojo framework against ReAct, Reflexion, and AutoGPT baselines:

Metric	Voyager	Best Baseline	Improvement
Unique items obtained	63	19 (AutoGPT)	3.3x
Travel distance	2300+ blocks	1000 blocks	2.3x
Wooden tools (time)	2 min	30.6 min (ReAct)	15.3x faster
Stone tools (time)	5 min	42.5 min	8.5x faster
Iron tools (time)	15 min	96 min	6.4x faster
Diamond tools	Achieved	Not achieved	Unique to Voyager

Voyager is the only agent to unlock the complete Minecraft technology tree through to diamond-level tools.

Lifelong Learning

The lifelong learning paradigm enables continuous improvement:

<latex>\mathcal{S}_{t+1} = \mathcal{S}_t \cup \{s_{new}\} \text{ where } s_{new} = \text{verify}(\text{generate}(g_t, \mathcal{S}_t, o_t))</latex>

where <latex>\mathcal{S}_t</latex> is the skill library at time <latex>t</latex>, <latex>g_t</latex> is the curriculum-proposed goal, and <latex>o_t</latex> is the environment observation. The library grows monotonically, and skills compound — enabling behaviors impossible through any single generation step.

AI Agent Knowledge Base

Sidebar

Table of Contents

Voyager: Open-Ended Embodied Agent with LLMs

Three Core Components

1. Automatic Curriculum

2. Skill Library

3. Iterative Code Generation

Code Example

Benchmark Results

Lifelong Learning

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Voyager: Open-Ended Embodied Agent with LLMs

Three Core Components

1. Automatic Curriculum

2. Skill Library

3. Iterative Code Generation

Code Example

Benchmark Results

Lifelong Learning

References

See Also

Page Tools