====== ExpeL: Experiential Learning for LLM Agents ======
**ExpeL** (Experiential Learning) is an autonomous agent framework introduced by Zhao et al. (2023) that enables LLM agents to learn from past experiences **without any gradient-based parameter updates**. With **419 citations**, it demonstrates that agents can extract, store, and reuse natural language insights from both successes and failures, achieving progressive improvement over time.(([[https://arxiv.org/abs/2308.10144|Zhao et al. "ExpeL: LLM Agents Are Experiential Learners" (2023)]]))(([[https://andrewzh112.github.io/expel/|ExpeL Project Page]]))(([[https://arxiv.org/abs/2304.03442|Shinn et al. "Reflexion: Language Agents with Verbal Reinforcement Learning" (2023)]]))
[[https://arxiv.org/abs/2308.10144|arXiv:2308.10144]](([[https://arxiv.org/abs/2210.03629|Yao et al. "ReAct: Synergizing Reasoning and Acting in Language Models" (2022)]]))
===== Core Mechanism =====
ExpeL operates on the principle that an LLM can reflect on its own trajectories to distill reusable knowledge. Unlike fine-tuning approaches, all learning happens at the inference level through a structured memory system.
The learning objective can be expressed as:
$$\pi^*(a_t | s_t) = \arg\max_a \mathbb{E}\left[R \mid s_t, a, \mathcal{I}, \mathcal{E}\right]$$
where $\pi^*$ is the improved policy, $s_t$ is the current state, $\mathcal{I}$ is the set of extracted insights, and $\mathcal{E}$ is the experience pool of past trajectories.
===== Three-Stage Pipeline =====
==== Stage 1: Experience Gathering ====
The agent interacts with training tasks using a base planner (ReAct or Act) powered by GPT-3.5-turbo. Each interaction produces a trajectory $\tau = (s_0, a_0, o_0, \ldots, s_T)$ containing states, actions, and observations. Both successful and failed trajectories are stored.
==== Stage 2: Insight Extraction ====
The LLM analyzes collected trajectories to extract **natural language insights** -- abstract rules and strategies that generalize across tasks:
* Compare successful vs. failed trajectories on similar tasks
* Identify patterns that led to success or failure
* Distill task-agnostic strategies (e.g., "Always verify search results before answering")
==== Stage 3: Task Execution ====
For new tasks, the agent retrieves relevant past trajectories and insights from memory, incorporating them into its reasoning context for more informed decision-making.
===== System Architecture =====
graph TD
A[Training Tasks] --> B[Experience Gathering]
B --> C[Success Trajectories]
B --> D[Failure Trajectories]
C --> E[Insight Extraction via LLM]
D --> E
E --> F[Natural Language Insights]
C --> G[Experience Pool]
D --> G
F --> H[Insight Memory Bank]
G --> H
I[New Task] --> J[Memory Retrieval]
H --> J
J --> K[Similar Trajectories]
J --> L[Relevant Insights]
K --> M[Augmented Reasoning Context]
L --> M
M --> N[ReAct Agent Execution]
N --> O[Task Solution]
N --> P[New Experience]
P --> G
===== Code Example =====
# Simplified ExpeL agent with experience-based learning
class ExpeL:
def __init__(self, llm, retriever):
self.llm = llm
self.retriever = retriever
self.experience_pool = []
self.insights = []
def gather_experience(self, task, environment):
trajectory = self._run_react(task, environment)
trajectory["success"] = environment.check_success()
self.experience_pool.append(trajectory)
return trajectory
def extract_insights(self, batch_size=5):
recent = self.experience_pool[-batch_size:]
successes = [t for t in recent if t["success"]]
failures = [t for t in recent if not t["success"]]
prompt = (
f"Compare these successful trajectories:\n{successes}\n"
f"With these failures:\n{failures}\n"
"Extract general insights as rules for future tasks."
)
new_insights = self.llm.generate(prompt)
self.insights.extend(self._parse_insights(new_insights))
def solve(self, task, environment):
similar_experiences = self.retriever.search(
query=task, pool=self.experience_pool, top_k=3
)
relevant_insights = self.retriever.search(
query=task, pool=self.insights, top_k=5
)
context = self._build_context(task, similar_experiences, relevant_insights)
return self._run_react(context, environment)
===== Key Results =====
^ Benchmark ^ Task Type ^ ExpeL vs. Baselines ^
| HotpotQA | Multi-hop QA | Outperforms ReAct, Act, and imitation learning |
| ALFWorld | Household tasks | Consistent gains with experience accumulation |
| WebShop | E-commerce navigation | Surpasses strong baselines |
* Performance improves progressively as more experiences accumulate
* Positive forward transfer: insights from source tasks benefit unseen target tasks
* Ablation confirms synergy between trajectory retrieval and insight extraction
* All learning occurs at inference time with no weight updates
===== See Also =====
* [[agentverse|AgentVerse: Multi-Agent Collaboration]]
* [[reasoning_via_planning|RAP: Reasoning via Planning]]
* [[webgpt|WebGPT: Browser-Assisted Question Answering]]
===== References =====