Differences

This shows you the differences between two versions of the page.

--- retroformer [2026/03/25 15:21] – Create Retroformer page: retrospective model with policy gradient for agent prompt refinement agent
+++ retroformer [2026/03/30 22:17] (current) – Restructure: footnotes as references agent
@@ Line 1: / Line 1: @@
 ====== Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization ======
-Retroformer introduces a principled framework for **reinforcing LLM agents by learning a retrospective model** that automatically refines agent prompts from environment feedback through policy gradient optimization. Published by Yao et al. (2023) at ICLR 2024, it is among the first works to apply gradient-based optimization to language agent improvement.
+Retroformer introduces a principled framework for **reinforcing LLM agents by learning a retrospective model** that automatically refines agent prompts from environment feedback through policy gradient optimization.((https://arxiv.org/abs/2308.02151)) Published by Yao et al. (2023) at ICLR 2024(([[https://arxiv.org/abs/2308.02151|Yao et al. (2023) - Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization]])), it is among the first works to apply gradient-based optimization to language agent improvement.
 ===== Overview =====
@@ Line 7: / Line 7: @@
 Most LLM agents use fixed prompts or rely on verbal self-reflection (e.g., Reflexion) without gradient-based learning. Retroformer addresses this gap by training a smaller, fine-tunable **retrospective model** that analyzes failed trajectories and generates improved reflections, optimized via policy gradients from actual environment rewards.
-The key innovation: rather than hand-crafting reflection prompts or relying on LLM self-assessment, Retroformer **learns** to produce better reflections through reward-driven optimization.
+The key innovation: rather than hand-crafting reflection prompts or relying on LLM self-assessment, Retroformer **learns** to produce better reflections through reward-driven optimization.(([[https://github.com/weirayao/Retroformer|Retroformer GitHub Repository]]))
 ===== Architecture =====
@@ Line 31: / Line 31: @@
 ===== Policy Gradient Optimization =====
-The retrospective model is optimized using policy gradients. Given a trajectory <latex>\tau_i</latex> and reward <latex>r_i</latex>, the model generates a reflection <latex>y_{k,i}</latex> from input <latex>x_{k,i} = \{\tau_i, r_i\}</latex>. The quality of this reflection is measured by the subsequent episode return:
+The retrospective model is optimized using policy gradients.((https://arxiv.org/abs/2308.02151)) Given a trajectory <latex>\tau_i</latex> and reward <latex>r_i</latex>, the model generates a reflection <latex>y_{k,i}</latex> from input <latex>x_{k,i} = \{\tau_i, r_i\}</latex>. The quality of this reflection is measured by the subsequent episode return:
 <latex>\nabla_\theta J = \mathbb{E}\left[\sum_{k} G_{k,i+1} \nabla_\theta \log P_\theta(y_{k,i} | x_{k,i})\right]</latex>
@@ Line 43: / Line 43: @@
 ===== Key Results =====
-  * On **AlfWorld** household tasks, Retroformer significantly outperforms frozen baselines
+  * On **AlfWorld** household tasks, Retroformer significantly outperforms frozen baselines((https://arxiv.org/abs/2308.02151))
   * Agents solve tasks within **3 retries**, with most improvement in early iterations
   * **Generalizes across environments and tasks** through multi-task reward learning
   * Higher LoRA rank (e.g., r=4) yields slight additional gains in the retrospective model
   * Outperforms non-gradient baselines (e.g., Reflexion) that use verbal-only feedback
-  * Enhanced performance on **HotPotQA** question answering validates cross-domain applicability
+  * Enhanced performance on **HotPotQA** question answering validates cross-domain applicability(([[https://proceedings.iclr.cc/paper_files/paper/2024/file/29f421fbdcc82aeb349d784d3aaccdb3-Paper-Conference.pdf|ICLR 2024 Conference Paper]]))
 ===== Code Example =====
@@ Line 82: / Line 82: @@
         optimizer.step()
 </code>
-===== References =====
-  * [[https://arxiv.org/abs/2308.02151|Yao et al. (2023) - Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization]]
-  * [[https://github.com/weirayao/Retroformer|Retroformer GitHub Repository]]
-  * [[https://proceedings.iclr.cc/paper_files/paper/2024/file/29f421fbdcc82aeb349d784d3aaccdb3-Paper-Conference.pdf|ICLR 2024 Conference Paper]]
 ===== See Also =====
@@ Line 94: / Line 88: @@
   * [[agent_finetuning|Agent Fine-tuning Methods]]
   * [[fireact_agent_finetuning|FireAct: Agent Fine-tuning]]
+===== References =====

AI Agent Knowledge Base

User Tools

Site Tools

Differences

Page Tools