====== ExpeL: Experiential Learning for LLM Agents ====== **ExpeL** (Experiential Learning) is an autonomous agent framework introduced by Zhao et al. (2023) that enables LLM agents to learn from past experiences **without any gradient-based parameter updates**. With **419 citations**, it demonstrates that agents can extract, store, and reuse natural language insights from both successes and failures, achieving progressive improvement over time.(([[https://arxiv.org/abs/2308.10144|Zhao et al. "ExpeL: LLM Agents Are Experiential Learners" (2023)]]))(([[https://andrewzh112.github.io/expel/|ExpeL Project Page]]))(([[https://arxiv.org/abs/2304.03442|Shinn et al. "Reflexion: Language Agents with Verbal Reinforcement Learning" (2023)]])) [[https://arxiv.org/abs/2308.10144|arXiv:2308.10144]](([[https://arxiv.org/abs/2210.03629|Yao et al. "ReAct: Synergizing Reasoning and Acting in Language Models" (2022)]])) ===== Core Mechanism ===== ExpeL operates on the principle that an LLM can reflect on its own trajectories to distill reusable knowledge. Unlike fine-tuning approaches, all learning happens at the inference level through a structured memory system. The learning objective can be expressed as: $$\pi^*(a_t | s_t) = \arg\max_a \mathbb{E}\left[R \mid s_t, a, \mathcal{I}, \mathcal{E}\right]$$ where $\pi^*$ is the improved policy, $s_t$ is the current state, $\mathcal{I}$ is the set of extracted insights, and $\mathcal{E}$ is the experience pool of past trajectories. ===== Three-Stage Pipeline ===== ==== Stage 1: Experience Gathering ==== The agent interacts with training tasks using a base planner (ReAct or Act) powered by GPT-3.5-turbo. Each interaction produces a trajectory $\tau = (s_0, a_0, o_0, \ldots, s_T)$ containing states, actions, and observations. Both successful and failed trajectories are stored. ==== Stage 2: Insight Extraction ==== The LLM analyzes collected trajectories to extract **natural language insights** -- abstract rules and strategies that generalize across tasks: * Compare successful vs. failed trajectories on similar tasks * Identify patterns that led to success or failure * Distill task-agnostic strategies (e.g., "Always verify search results before answering") ==== Stage 3: Task Execution ==== For new tasks, the agent retrieves relevant past trajectories and insights from memory, incorporating them into its reasoning context for more informed decision-making. ===== System Architecture ===== graph TD A[Training Tasks] --> B[Experience Gathering] B --> C[Success Trajectories] B --> D[Failure Trajectories] C --> E[Insight Extraction via LLM] D --> E E --> F[Natural Language Insights] C --> G[Experience Pool] D --> G F --> H[Insight Memory Bank] G --> H I[New Task] --> J[Memory Retrieval] H --> J J --> K[Similar Trajectories] J --> L[Relevant Insights] K --> M[Augmented Reasoning Context] L --> M M --> N[ReAct Agent Execution] N --> O[Task Solution] N --> P[New Experience] P --> G ===== Code Example ===== # Simplified ExpeL agent with experience-based learning class ExpeL: def __init__(self, llm, retriever): self.llm = llm self.retriever = retriever self.experience_pool = [] self.insights = [] def gather_experience(self, task, environment): trajectory = self._run_react(task, environment) trajectory["success"] = environment.check_success() self.experience_pool.append(trajectory) return trajectory def extract_insights(self, batch_size=5): recent = self.experience_pool[-batch_size:] successes = [t for t in recent if t["success"]] failures = [t for t in recent if not t["success"]] prompt = ( f"Compare these successful trajectories:\n{successes}\n" f"With these failures:\n{failures}\n" "Extract general insights as rules for future tasks." ) new_insights = self.llm.generate(prompt) self.insights.extend(self._parse_insights(new_insights)) def solve(self, task, environment): similar_experiences = self.retriever.search( query=task, pool=self.experience_pool, top_k=3 ) relevant_insights = self.retriever.search( query=task, pool=self.insights, top_k=5 ) context = self._build_context(task, similar_experiences, relevant_insights) return self._run_react(context, environment) ===== Key Results ===== ^ Benchmark ^ Task Type ^ ExpeL vs. Baselines ^ | HotpotQA | Multi-hop QA | Outperforms ReAct, Act, and imitation learning | | ALFWorld | Household tasks | Consistent gains with experience accumulation | | WebShop | E-commerce navigation | Surpasses strong baselines | * Performance improves progressively as more experiences accumulate * Positive forward transfer: insights from source tasks benefit unseen target tasks * Ablation confirms synergy between trajectory retrieval and insight extraction * All learning occurs at inference time with no weight updates ===== See Also ===== * [[agentverse|AgentVerse: Multi-Agent Collaboration]] * [[reasoning_via_planning|RAP: Reasoning via Planning]] * [[webgpt|WebGPT: Browser-Assisted Question Answering]] ===== References =====