====== Voyager: Open-Ended Embodied Agent with LLMs ====== Voyager is an LLM-powered embodied agent by Wang et al. (2023) that achieves **lifelong learning** in Minecraft through three interconnected components: an automatic curriculum, an ever-growing skill library, and an iterative code generation loop. Without any model fine-tuning or gradient updates, Voyager uses GPT-4 as a blackbox to explore, acquire skills, and compose increasingly complex behaviors — obtaining 3.3x more unique items, traveling 2.3x longer distances, and unlocking technology tree milestones up to 15.3x faster than prior approaches. graph TD AC[Automatic Curriculum] --> CG[Code Generation] CG --> EX[Execute in Minecraft] EX --> CHECK{Success?} CHECK -->|No| CG CHECK -->|Yes| SL[Store in Skill Library] SL --> AC ===== Three Core Components ===== Voyager's architecture integrates three modules that together enable open-ended exploration: === 1. Automatic Curriculum === The curriculum module proposes exploration goals by analyzing what the agent has not yet encountered: * Uses **inverse frequency scoring** across Minecraft wiki categories to identify unfamiliar areas * Prioritizes novel knowledge — if the agent has crafted many wooden tools but never explored caves, it suggests mining * Creates a bottom-up discovery process without predefined task sequences * Adapts dynamically based on the agent's current inventory, surroundings, and skill history === 2. Skill Library === A persistent, ever-growing repository of executable code skills: * Each skill is a JavaScript function executable via the Mineflayer API * Skills are indexed by description embeddings for semantic retrieval * Tagged with metadata: description, items used, usage count, recency * Enables **compositional behavior** — complex skills chain simpler ones (e.g., "build shelter" calls "chop wood" + "craft planks" + "place blocks") * Prevents catastrophic forgetting by persisting across sessions * Transfers to new Minecraft worlds for zero-shot generalization === 3. Iterative Code Generation === A feedback-driven loop for synthesizing new skills: - Retrieve top-k similar skills from the library via embedding similarity - Provide GPT-4 with current inventory, nearby blocks, and retrieved skill examples - GPT-4 generates executable JavaScript code for the proposed action - Execute via Mineflayer; capture success/failure and environment state changes - On failure, feed execution errors and self-verification feedback back to GPT-4 - Iterate up to 5-10 attempts until success - Verified skills are added to the skill library with metadata ===== Code Example ===== # Simplified Voyager-style skill generation and retrieval loop import openai import numpy as np from typing import List, Dict class SkillLibrary: def __init__(self): self.skills: Dict[str, dict] = {} self.embeddings: Dict[str, np.ndarray] = {} def add_skill(self, name: str, code: str, description: str): self.skills[name] = {"code": code, "description": description} self.embeddings[name] = get_embedding(description) def retrieve(self, query: str, top_k: int = 5) -> List[dict]: query_emb = get_embedding(query) scores = { name: cosine_similarity(query_emb, emb) for name, emb in self.embeddings.items() } top_names = sorted(scores, key=scores.get, reverse=True)[:top_k] return [self.skills[n] for n in top_names] def iterative_code_generation(goal: str, library: SkillLibrary, env_state: dict, max_retries: int = 5): similar_skills = library.retrieve(goal) context = format_context(env_state, similar_skills) for attempt in range(max_retries): code = gpt4_generate(goal, context) success, feedback = execute_in_minecraft(code) if success: library.add_skill(goal, code, description=goal) return code context += f"\nAttempt {attempt+1} failed: {feedback}" return None ===== Benchmark Results ===== Evaluated in the MineDojo framework against ReAct, Reflexion, and AutoGPT baselines: ^ Metric ^ Voyager ^ Best Baseline ^ Improvement ^ | Unique items obtained | 63 | 19 (AutoGPT) | 3.3x | | Travel distance | 2300+ blocks | 1000 blocks | 2.3x | | Wooden tools (time) | 2 min | 30.6 min (ReAct) | 15.3x faster | | Stone tools (time) | 5 min | 42.5 min | 8.5x faster | | Iron tools (time) | 15 min | 96 min | 6.4x faster | | Diamond tools | Achieved | Not achieved | Unique to Voyager | Voyager is the only agent to unlock the complete Minecraft technology tree through to diamond-level tools. ===== Lifelong Learning ===== The lifelong learning paradigm enables continuous improvement: \mathcal{S}_{t+1} = \mathcal{S}_t \cup \{s_{new}\} \text{ where } s_{new} = \text{verify}(\text{generate}(g_t, \mathcal{S}_t, o_t)) where \mathcal{S}_t is the skill library at time t, g_t is the curriculum-proposed goal, and o_t is the environment observation. The library grows monotonically, and skills compound — enabling behaviors impossible through any single generation step. ===== References ===== * [[https://arxiv.org/abs/2305.16291|Wang et al. "Voyager: An Open-Ended Embodied Agent with Large Language Models" (arXiv:2305.16291)]] * [[https://voyager.minedojo.org/|Voyager Project Website]] * [[https://github.com/MineDojo/Voyager|Voyager GitHub Repository]] * [[https://arxiv.org/abs/2206.08853|Fan et al. "MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge"]] ===== See Also ===== * [[gorilla|Gorilla — LLM trained for accurate API calling]] * [[metagpt|MetaGPT — Multi-agent software development framework]] * [[self_play_agents|Self-Play Agents — Self-improvement through agent interaction]]