Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
ExpeL (Experiential Learning) is an autonomous agent framework introduced by Zhao et al. (2023) that enables LLM agents to learn from past experiences without any gradient-based parameter updates. With 419 citations, it demonstrates that agents can extract, store, and reuse natural language insights from both successes and failures, achieving progressive improvement over time.
ExpeL operates on the principle that an LLM can reflect on its own trajectories to distill reusable knowledge. Unlike fine-tuning approaches, all learning happens at the inference level through a structured memory system.
The learning objective can be expressed as:
$$\pi^*(a_t | s_t) = \arg\max_a \mathbb{E}\left[R \mid s_t, a, \mathcal{I}, \mathcal{E}\right]$$
where $\pi^*$ is the improved policy, $s_t$ is the current state, $\mathcal{I}$ is the set of extracted insights, and $\mathcal{E}$ is the experience pool of past trajectories.
The agent interacts with training tasks using a base planner (ReAct or Act) powered by GPT-3.5-turbo. Each interaction produces a trajectory $\tau = (s_0, a_0, o_0, \ldots, s_T)$ containing states, actions, and observations. Both successful and failed trajectories are stored.
The LLM analyzes collected trajectories to extract natural language insights – abstract rules and strategies that generalize across tasks:
For new tasks, the agent retrieves relevant past trajectories and insights from memory, incorporating them into its reasoning context for more informed decision-making.
# Simplified ExpeL agent with experience-based learning class ExpeL: def __init__(self, llm, retriever): self.llm = llm self.retriever = retriever self.experience_pool = [] self.insights = [] def gather_experience(self, task, environment): trajectory = self._run_react(task, environment) trajectory["success"] = environment.check_success() self.experience_pool.append(trajectory) return trajectory def extract_insights(self, batch_size=5): recent = self.experience_pool[-batch_size:] successes = [t for t in recent if t["success"]] failures = [t for t in recent if not t["success"]] prompt = ( f"Compare these successful trajectories:\n{successes}\n" f"With these failures:\n{failures}\n" "Extract general insights as rules for future tasks." ) new_insights = self.llm.generate(prompt) self.insights.extend(self._parse_insights(new_insights)) def solve(self, task, environment): similar_experiences = self.retriever.search( query=task, pool=self.experience_pool, top_k=3 ) relevant_insights = self.retriever.search( query=task, pool=self.insights, top_k=5 ) context = self._build_context(task, similar_experiences, relevant_insights) return self._run_react(context, environment)
| Benchmark | Task Type | ExpeL vs. Baselines |
|---|---|---|
| HotpotQA | Multi-hop QA | Outperforms ReAct, Act, and imitation learning |
| ALFWorld | Household tasks | Consistent gains with experience accumulation |
| WebShop | E-commerce navigation | Surpasses strong baselines |