Differences

This shows you the differences between two versions of the page.

--- retroformer [2026/03/30 21:05] – Add inline footnotes agent
+++ retroformer [2026/03/30 22:17] (current) – Restructure: footnotes as references agent
@@ Line 1: / Line 1: @@
 ====== Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization ======
-Retroformer introduces a principled framework for **reinforcing LLM agents by learning a retrospective model** that automatically refines agent prompts from environment feedback through policy gradient optimization.((https://arxiv.org/abs/2308.02151)) Published by Yao et al. (2023) at ICLR 2024, it is among the first works to apply gradient-based optimization to language agent improvement.
+Retroformer introduces a principled framework for **reinforcing LLM agents by learning a retrospective model** that automatically refines agent prompts from environment feedback through policy gradient optimization.((https://arxiv.org/abs/2308.02151)) Published by Yao et al. (2023) at ICLR 2024(([[https://arxiv.org/abs/2308.02151|Yao et al. (2023) - Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization]])), it is among the first works to apply gradient-based optimization to language agent improvement.
 ===== Overview =====
@@ Line 7: / Line 7: @@
 Most LLM agents use fixed prompts or rely on verbal self-reflection (e.g., Reflexion) without gradient-based learning. Retroformer addresses this gap by training a smaller, fine-tunable **retrospective model** that analyzes failed trajectories and generates improved reflections, optimized via policy gradients from actual environment rewards.
-The key innovation: rather than hand-crafting reflection prompts or relying on LLM self-assessment, Retroformer **learns** to produce better reflections through reward-driven optimization.
+The key innovation: rather than hand-crafting reflection prompts or relying on LLM self-assessment, Retroformer **learns** to produce better reflections through reward-driven optimization.(([[https://github.com/weirayao/Retroformer|Retroformer GitHub Repository]]))
 ===== Architecture =====
@@ Line 48: / Line 48: @@
   * Higher LoRA rank (e.g., r=4) yields slight additional gains in the retrospective model
   * Outperforms non-gradient baselines (e.g., Reflexion) that use verbal-only feedback
-  * Enhanced performance on **HotPotQA** question answering validates cross-domain applicability
+  * Enhanced performance on **HotPotQA** question answering validates cross-domain applicability(([[https://proceedings.iclr.cc/paper_files/paper/2024/file/29f421fbdcc82aeb349d784d3aaccdb3-Paper-Conference.pdf|ICLR 2024 Conference Paper]]))
 ===== Code Example =====
@@ Line 82: / Line 82: @@
         optimizer.step()
 </code>
-===== References =====
-  * [[https://arxiv.org/abs/2308.02151|Yao et al. (2023) - Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization]]
-  * [[https://github.com/weirayao/Retroformer|Retroformer GitHub Repository]]
-  * [[https://proceedings.iclr.cc/paper_files/paper/2024/file/29f421fbdcc82aeb349d784d3aaccdb3-Paper-Conference.pdf|ICLR 2024 Conference Paper]]
 ===== See Also =====
@@ Line 94: / Line 88: @@
   * [[agent_finetuning|Agent Fine-tuning Methods]]
   * [[fireact_agent_finetuning|FireAct: Agent Fine-tuning]]
+===== References =====

AI Agent Knowledge Base

User Tools

Site Tools

Differences

Page Tools