====== FireAct: Toward Language Agent Fine-tuning ====== FireAct is a fine-tuning approach that enables **smaller language models to perform agentic tasks at levels approaching GPT-4** by training on diverse trajectories generated by stronger models. Introduced by Chen et al. (2023), FireAct demonstrates that multi-task, multi-method trajectory data is the key to effective agent fine-tuning.(([[https://arxiv.org/abs/2310.05915|Chen et al. (2023) - FireAct: Toward Language Agent Fine-tuning]]))(([[https://princeton-nlp.github.io/fireact/|FireAct Project Page (Princeton NLP)]]))(([[https://fireact-agent.github.io|FireAct Demo and Resources]])) ===== Overview ===== Prompting-based agents (ReAct, Reflexion, CoT) are limited by the base model's capacity and require expensive few-shot demonstrations at inference. FireAct shows that fine-tuning on GPT-4-generated trajectories allows 7B-parameter models to match or exceed prompted GPT-3.5 on agent tasks, with greater robustness to noisy tool outputs. The core insight: **data diversity across tasks and methods matters more than data volume**. ===== Methodology ===== graph TD A[Multiple Task Datasets] --> B[GPT-4 Trajectory Generation] B --> C1[CoT Trajectories] B --> C2[ReAct Trajectories] B --> C3[Reflexion Trajectories] C1 --> D[Convert to ReAct Format] C2 --> D C3 --> D D --> E[Filter Successful Trajectories] E --> F[Fine-tune Smaller Model] F --> G[Agent LM - No Few-shot Needed] The training process: - **Trajectory Generation**: GPT-4 solves tasks from multiple datasets using CoT, ReAct, and Reflexion prompting - **Format Unification**: All successful trajectories are converted to the ReAct format (interleaved Thought/Action/Observation) - **Supervised Fine-tuning**: Smaller models (Llama2-7B, GPT-3.5) are fine-tuned on the unified trajectory data The fine-tuning objective: \mathcal{L} = -\sum_{t=1}^{T} \log P_\theta(a_t | x, a_{ where a_t are the thought-action tokens and x is the task input. Diversity is controlled by mixing trajectories from K tasks and M methods: \mathcal{D}_{\text{train}} = \bigcup_{k=1}^{K} \bigcup_{m=1}^{M} \mathcal{T}_{k,m} ===== Models and Benchmarks ===== ^ Model ^ Role ^ | GPT-4 | Teacher: generates training trajectories | | GPT-3.5 | Student: fine-tuned on trajectories | | Llama2-7B | Student: fine-tuned on trajectories | ^ Benchmark ^ Task Type ^ | HotpotQA | Multi-hop question answering | | StrategyQA | Strategic reasoning | | Bamboogle | Complex QA | | MMLU | Broad knowledge evaluation | ===== Key Results ===== * **Llama2-7B on HotpotQA**: 77% performance increase after fine-tuning with 500 GPT-4 trajectories * **GPT-3.5 on HotpotQA**: EM score 31.4 -> 39.2 (+25% improvement) * **GPT-3.5 on Bamboogle**: EM 40.8 -> 44.0, outperforming prompted GPT-3.5 * **Robustness**: Fine-tuned agents show only 14.2% performance drop with noisy tools vs. 33.8% for prompted agents * Multi-task training consistently outperforms single-task training * Fine-tuned agents require **no few-shot examples** at inference, reducing cost ===== Code Example ===== # FireAct-style trajectory generation and fine-tuning pipeline from datasets import load_dataset from transformers import AutoModelForCausalLM, Trainer, TrainingArguments # Step 1: Generate trajectories with GPT-4 def generate_trajectory(task, method='react'): prompt = build_agent_prompt(task, method) trajectory = gpt4_agent.run(prompt, tools=['search', 'lookup']) if trajectory.success: return convert_to_react_format(trajectory) return None # Step 2: Collect multi-task, multi-method trajectories trajectories = [] for dataset_name in ['hotpotqa', 'strategyqa', 'bamboogle']: dataset = load_dataset(dataset_name, split='train[:500]') for method in ['cot', 'react', 'reflexion']: for sample in dataset: traj = generate_trajectory(sample, method) if traj: trajectories.append(traj) # Step 3: Fine-tune smaller model model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf') trainer = Trainer( model=model, args=TrainingArguments(output_dir='./fireact-7b', num_train_epochs=3), train_dataset=trajectories, ) trainer.train() ===== See Also ===== * [[agenttuning|AgentTuning: Instruction-Tuning for Agent Abilities]] * [[react|ReAct: Reasoning and Acting]] * [[reflexion|Reflexion: Verbal Reinforcement Learning]] * [[retroformer|Retroformer: Policy Gradient Agent Optimization]] ===== References =====