====== ExpeL: Experiential Learning for LLM Agents ======

**ExpeL** (Experiential Learning) is an autonomous agent framework introduced by Zhao et al. (2023) that enables LLM agents to learn from past experiences **without any gradient-based parameter updates**. With **419 citations**, it demonstrates that agents can extract, store, and reuse natural language insights from both successes and failures, achieving progressive improvement over time.(([[https://arxiv.org/abs/2308.10144|Zhao et al. "ExpeL: LLM Agents Are Experiential Learners" (2023)]]))(([[https://andrewzh112.github.io/expel/|ExpeL Project Page]]))(([[https://arxiv.org/abs/2304.03442|Shinn et al. "Reflexion: Language Agents with Verbal Reinforcement Learning" (2023)]]))

[[https://arxiv.org/abs/2308.10144|arXiv:2308.10144]](([[https://arxiv.org/abs/2210.03629|Yao et al. "ReAct: Synergizing Reasoning and Acting in Language Models" (2022)]]))

===== Core Mechanism =====

ExpeL operates on the principle that an LLM can reflect on its own trajectories to distill reusable knowledge. Unlike fine-tuning approaches, all learning happens at the inference level through a structured memory system.

The learning objective can be expressed as:

$$\pi^*(a_t | s_t) = \arg\max_a \mathbb{E}\left[R \mid s_t, a, \mathcal{I}, \mathcal{E}\right]$$

where $\pi^*$ is the improved policy, $s_t$ is the current state, $\mathcal{I}$ is the set of extracted insights, and $\mathcal{E}$ is the experience pool of past trajectories.

===== Three-Stage Pipeline =====

==== Stage 1: Experience Gathering ====

The agent interacts with training tasks using a base planner (ReAct or Act) powered by GPT-3.5-turbo. Each interaction produces a trajectory $\tau = (s_0, a_0, o_0, \ldots, s_T)$ containing states, actions, and observations. Both successful and failed trajectories are stored.

==== Stage 2: Insight Extraction ====

The LLM analyzes collected trajectories to extract **natural language insights** -- abstract rules and strategies that generalize across tasks:

  * Compare successful vs. failed trajectories on similar tasks
  * Identify patterns that led to success or failure
  * Distill task-agnostic strategies (e.g., "Always verify search results before answering")

==== Stage 3: Task Execution ====

For new tasks, the agent retrieves relevant past trajectories and insights from memory, incorporating them into its reasoning context for more informed decision-making.

===== System Architecture =====

<mermaid>
graph TD
    A[Training Tasks] --> B[Experience Gathering]
    B --> C[Success Trajectories]
    B --> D[Failure Trajectories]
    C --> E[Insight Extraction via LLM]
    D --> E
    E --> F[Natural Language Insights]
    C --> G[Experience Pool]
    D --> G
    F --> H[Insight Memory Bank]
    G --> H
    I[New Task] --> J[Memory Retrieval]
    H --> J
    J --> K[Similar Trajectories]
    J --> L[Relevant Insights]
    K --> M[Augmented Reasoning Context]
    L --> M
    M --> N[ReAct Agent Execution]
    N --> O[Task Solution]
    N --> P[New Experience]
    P --> G
</mermaid>

===== Code Example =====

<code python>
# Simplified ExpeL agent with experience-based learning
class ExpeL:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever
        self.experience_pool = []
        self.insights = []

    def gather_experience(self, task, environment):
        trajectory = self._run_react(task, environment)
        trajectory["success"] = environment.check_success()
        self.experience_pool.append(trajectory)
        return trajectory

    def extract_insights(self, batch_size=5):
        recent = self.experience_pool[-batch_size:]
        successes = [t for t in recent if t["success"]]
        failures = [t for t in recent if not t["success"]]
        prompt = (
            f"Compare these successful trajectories:\n{successes}\n"
            f"With these failures:\n{failures}\n"
            "Extract general insights as rules for future tasks."
        )
        new_insights = self.llm.generate(prompt)
        self.insights.extend(self._parse_insights(new_insights))

    def solve(self, task, environment):
        similar_experiences = self.retriever.search(
            query=task, pool=self.experience_pool, top_k=3
        )
        relevant_insights = self.retriever.search(
            query=task, pool=self.insights, top_k=5
        )
        context = self._build_context(task, similar_experiences, relevant_insights)
        return self._run_react(context, environment)
</code>

===== Key Results =====

^ Benchmark ^ Task Type ^ ExpeL vs. Baselines ^
| HotpotQA | Multi-hop QA | Outperforms ReAct, Act, and imitation learning |
| ALFWorld | Household tasks | Consistent gains with experience accumulation |
| WebShop | E-commerce navigation | Surpasses strong baselines |

  * Performance improves progressively as more experiences accumulate
  * Positive forward transfer: insights from source tasks benefit unseen target tasks
  * Ablation confirms synergy between trajectory retrieval and insight extraction
  * All learning occurs at inference time with no weight updates

===== See Also =====

  * [[agentverse|AgentVerse: Multi-Agent Collaboration]]
  * [[reasoning_via_planning|RAP: Reasoning via Planning]]
  * [[webgpt|WebGPT: Browser-Assisted Question Answering]]

===== References =====