Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
Retroformer introduces a principled framework for reinforcing LLM agents by learning a retrospective model that automatically refines agent prompts from environment feedback through policy gradient optimization.1) Published by Yao et al. (2023) at ICLR 2024, it is among the first works to apply gradient-based optimization to language agent improvement.
Most LLM agents use fixed prompts or rely on verbal self-reflection (e.g., Reflexion) without gradient-based learning. Retroformer addresses this gap by training a smaller, fine-tunable retrospective model that analyzes failed trajectories and generates improved reflections, optimized via policy gradients from actual environment rewards.
The key innovation: rather than hand-crafting reflection prompts or relying on LLM self-assessment, Retroformer learns to produce better reflections through reward-driven optimization.
The system comprises two models:
The retrospective model is optimized using policy gradients.2) Given a trajectory <latex>\tau_i</latex> and reward <latex>r_i</latex>, the model generates a reflection <latex>y_{k,i}</latex> from input <latex>x_{k,i} = \{\tau_i, r_i\}</latex>. The quality of this reflection is measured by the subsequent episode return:
<latex>\nabla_\theta J = \mathbb{E}\left[\sum_{k} G_{k,i+1} \nabla_\theta \log P_\theta(y_{k,i} | x_{k,i})\right]</latex>
where <latex>G_{k,i+1}</latex> is the return of the next episode after applying the reflection. This enables the retrospective model to learn which types of reflections lead to better task performance.
The reflection output summarizes:
# Retroformer-style retrospective learning loop from transformers import AutoModelForCausalLM import torch # Frozen actor (large LLM) and trainable retrospective model actor = load_frozen_actor('gpt-4') retro_model = AutoModelForCausalLM.from_pretrained('retro-base') optimizer = torch.optim.Adam(retro_model.parameters(), lr=1e-5) for episode in range(num_episodes): # Actor executes task with current prompt trajectory, reward = actor.execute(task, prompt=current_prompt) if not reward: # Failed episode # Retrospective model generates reflection reflection_input = format_reflection_input(trajectory, reward) reflection = retro_model.generate(reflection_input) # Update prompt with reflection current_prompt = append_reflection(current_prompt, reflection) # Next episode to get G_{k,i+1} next_traj, next_reward = actor.execute(task, prompt=current_prompt) # Policy gradient update on retrospective model loss = -next_reward * log_prob(reflection, reflection_input) loss.backward() optimizer.step()