====== Voyager: Open-Ended Embodied Agent with LLMs ======

Voyager is an LLM-powered embodied agent by Wang et al. (2023) that achieves **lifelong learning** in Minecraft through three interconnected components: an automatic curriculum, an ever-growing skill library, and an iterative code generation loop. Without any model fine-tuning or gradient updates, Voyager uses GPT-4 as a blackbox to explore, acquire skills, and compose increasingly complex behaviors — obtaining 3.3x more unique items, traveling 2.3x longer distances, and unlocking technology tree milestones up to 15.3x faster than prior approaches.


<mermaid>
graph TD
    AC[Automatic Curriculum] --> CG[Code Generation]
    CG --> EX[Execute in Minecraft]
    EX --> CHECK{Success?}
    CHECK -->|No| CG
    CHECK -->|Yes| SL[Store in Skill Library]
    SL --> AC
</mermaid>

===== Three Core Components =====

Voyager's architecture integrates three modules that together enable open-ended exploration:

=== 1. Automatic Curriculum ===

The curriculum module proposes exploration goals by analyzing what the agent has not yet encountered:

  * Uses **inverse frequency scoring** across Minecraft wiki categories to identify unfamiliar areas
  * Prioritizes novel knowledge — if the agent has crafted many wooden tools but never explored caves, it suggests mining
  * Creates a bottom-up discovery process without predefined task sequences
  * Adapts dynamically based on the agent's current inventory, surroundings, and skill history

=== 2. Skill Library ===

A persistent, ever-growing repository of executable code skills:

  * Each skill is a JavaScript function executable via the Mineflayer API
  * Skills are indexed by description embeddings for semantic retrieval
  * Tagged with metadata: description, items used, usage count, recency
  * Enables **compositional behavior** — complex skills chain simpler ones (e.g., "build shelter" calls "chop wood" + "craft planks" + "place blocks")
  * Prevents catastrophic forgetting by persisting across sessions
  * Transfers to new Minecraft worlds for zero-shot generalization

=== 3. Iterative Code Generation ===

A feedback-driven loop for synthesizing new skills:

  - Retrieve top-k similar skills from the library via embedding similarity
  - Provide GPT-4 with current inventory, nearby blocks, and retrieved skill examples
  - GPT-4 generates executable JavaScript code for the proposed action
  - Execute via Mineflayer; capture success/failure and environment state changes
  - On failure, feed execution errors and self-verification feedback back to GPT-4
  - Iterate up to 5-10 attempts until success
  - Verified skills are added to the skill library with metadata

===== Code Example =====

<code python>
# Simplified Voyager-style skill generation and retrieval loop
import openai
import numpy as np
from typing import List, Dict

class SkillLibrary:
    def __init__(self):
        self.skills: Dict[str, dict] = {}
        self.embeddings: Dict[str, np.ndarray] = {}

    def add_skill(self, name: str, code: str, description: str):
        self.skills[name] = {"code": code, "description": description}
        self.embeddings[name] = get_embedding(description)

    def retrieve(self, query: str, top_k: int = 5) -> List[dict]:
        query_emb = get_embedding(query)
        scores = {
            name: cosine_similarity(query_emb, emb)
            for name, emb in self.embeddings.items()
        }
        top_names = sorted(scores, key=scores.get, reverse=True)[:top_k]
        return [self.skills[n] for n in top_names]

def iterative_code_generation(goal: str, library: SkillLibrary,
                               env_state: dict, max_retries: int = 5):
    similar_skills = library.retrieve(goal)
    context = format_context(env_state, similar_skills)

    for attempt in range(max_retries):
        code = gpt4_generate(goal, context)
        success, feedback = execute_in_minecraft(code)
        if success:
            library.add_skill(goal, code, description=goal)
            return code
        context += f"\nAttempt {attempt+1} failed: {feedback}"
    return None
</code>

===== Benchmark Results =====

Evaluated in the MineDojo framework against ReAct, Reflexion, and AutoGPT baselines:

^ Metric ^ Voyager ^ Best Baseline ^ Improvement ^
| Unique items obtained | 63 | 19 (AutoGPT) | 3.3x |
| Travel distance | 2300+ blocks | 1000 blocks | 2.3x |
| Wooden tools (time) | 2 min | 30.6 min (ReAct) | 15.3x faster |
| Stone tools (time) | 5 min | 42.5 min | 8.5x faster |
| Iron tools (time) | 15 min | 96 min | 6.4x faster |
| Diamond tools | Achieved | Not achieved | Unique to Voyager |

Voyager is the only agent to unlock the complete Minecraft technology tree through to diamond-level tools.

===== Lifelong Learning =====

The lifelong learning paradigm enables continuous improvement:

<latex>\mathcal{S}_{t+1} = \mathcal{S}_t \cup \{s_{new}\} \text{ where } s_{new} = \text{verify}(\text{generate}(g_t, \mathcal{S}_t, o_t))</latex>

where <latex>\mathcal{S}_t</latex> is the skill library at time <latex>t</latex>, <latex>g_t</latex> is the curriculum-proposed goal, and <latex>o_t</latex> is the environment observation. The library grows monotonically, and skills compound — enabling behaviors impossible through any single generation step.

===== References =====

  * [[https://arxiv.org/abs/2305.16291|Wang et al. "Voyager: An Open-Ended Embodied Agent with Large Language Models" (arXiv:2305.16291)]]
  * [[https://voyager.minedojo.org/|Voyager Project Website]]
  * [[https://github.com/MineDojo/Voyager|Voyager GitHub Repository]]
  * [[https://arxiv.org/abs/2206.08853|Fan et al. "MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge"]]

===== See Also =====

  * [[gorilla|Gorilla — LLM trained for accurate API calling]]
  * [[metagpt|MetaGPT — Multi-agent software development framework]]
  * [[self_play_agents|Self-Play Agents — Self-improvement through agent interaction]]