Table of Contents

ExpeL: Experiential Learning for LLM Agents

ExpeL (Experiential Learning) is an autonomous agent framework introduced by Zhao et al. (2023) that enables LLM agents to learn from past experiences without any gradient-based parameter updates. With 419 citations, it demonstrates that agents can extract, store, and reuse natural language insights from both successes and failures, achieving progressive improvement over time.1)2)3)

arXiv:2308.101444)

Core Mechanism

ExpeL operates on the principle that an LLM can reflect on its own trajectories to distill reusable knowledge. Unlike fine-tuning approaches, all learning happens at the inference level through a structured memory system.

The learning objective can be expressed as:

$$\pi^*(a_t | s_t) = \arg\max_a \mathbb{E}\left[R \mid s_t, a, \mathcal{I}, \mathcal{E}\right]$$

where $\pi^*$ is the improved policy, $s_t$ is the current state, $\mathcal{I}$ is the set of extracted insights, and $\mathcal{E}$ is the experience pool of past trajectories.

Three-Stage Pipeline

Stage 1: Experience Gathering

The agent interacts with training tasks using a base planner (ReAct or Act) powered by GPT-3.5-turbo. Each interaction produces a trajectory $\tau = (s_0, a_0, o_0, \ldots, s_T)$ containing states, actions, and observations. Both successful and failed trajectories are stored.

Stage 2: Insight Extraction

The LLM analyzes collected trajectories to extract natural language insights – abstract rules and strategies that generalize across tasks:

Stage 3: Task Execution

For new tasks, the agent retrieves relevant past trajectories and insights from memory, incorporating them into its reasoning context for more informed decision-making.

System Architecture

graph TD A[Training Tasks] --> B[Experience Gathering] B --> C[Success Trajectories] B --> D[Failure Trajectories] C --> E[Insight Extraction via LLM] D --> E E --> F[Natural Language Insights] C --> G[Experience Pool] D --> G F --> H[Insight Memory Bank] G --> H I[New Task] --> J[Memory Retrieval] H --> J J --> K[Similar Trajectories] J --> L[Relevant Insights] K --> M[Augmented Reasoning Context] L --> M M --> N[ReAct Agent Execution] N --> O[Task Solution] N --> P[New Experience] P --> G

Code Example

# Simplified ExpeL agent with experience-based learning
class ExpeL:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever
        self.experience_pool = []
        self.insights = []
 
    def gather_experience(self, task, environment):
        trajectory = self._run_react(task, environment)
        trajectory["success"] = environment.check_success()
        self.experience_pool.append(trajectory)
        return trajectory
 
    def extract_insights(self, batch_size=5):
        recent = self.experience_pool[-batch_size:]
        successes = [t for t in recent if t["success"]]
        failures = [t for t in recent if not t["success"]]
        prompt = (
            f"Compare these successful trajectories:\n{successes}\n"
            f"With these failures:\n{failures}\n"
            "Extract general insights as rules for future tasks."
        )
        new_insights = self.llm.generate(prompt)
        self.insights.extend(self._parse_insights(new_insights))
 
    def solve(self, task, environment):
        similar_experiences = self.retriever.search(
            query=task, pool=self.experience_pool, top_k=3
        )
        relevant_insights = self.retriever.search(
            query=task, pool=self.insights, top_k=5
        )
        context = self._build_context(task, similar_experiences, relevant_insights)
        return self._run_react(context, environment)

Key Results

Benchmark Task Type ExpeL vs. Baselines
HotpotQA Multi-hop QA Outperforms ReAct, Act, and imitation learning
ALFWorld Household tasks Consistent gains with experience accumulation
WebShop E-commerce navigation Surpasses strong baselines

See Also

References