====== FireAct: Toward Language Agent Fine-tuning ======
FireAct is a fine-tuning approach that enables **smaller language models to perform agentic tasks at levels approaching GPT-4** by training on diverse trajectories generated by stronger models. Introduced by Chen et al. (2023), FireAct demonstrates that multi-task, multi-method trajectory data is the key to effective agent fine-tuning.(([[https://arxiv.org/abs/2310.05915|Chen et al. (2023) - FireAct: Toward Language Agent Fine-tuning]]))(([[https://princeton-nlp.github.io/fireact/|FireAct Project Page (Princeton NLP)]]))(([[https://fireact-agent.github.io|FireAct Demo and Resources]]))
===== Overview =====
Prompting-based agents (ReAct, Reflexion, CoT) are limited by the base model's capacity and require expensive few-shot demonstrations at inference. FireAct shows that fine-tuning on GPT-4-generated trajectories allows 7B-parameter models to match or exceed prompted GPT-3.5 on agent tasks, with greater robustness to noisy tool outputs.
The core insight: **data diversity across tasks and methods matters more than data volume**.
===== Methodology =====
graph TD
A[Multiple Task Datasets] --> B[GPT-4 Trajectory Generation]
B --> C1[CoT Trajectories]
B --> C2[ReAct Trajectories]
B --> C3[Reflexion Trajectories]
C1 --> D[Convert to ReAct Format]
C2 --> D
C3 --> D
D --> E[Filter Successful Trajectories]
E --> F[Fine-tune Smaller Model]
F --> G[Agent LM - No Few-shot Needed]
The training process:
- **Trajectory Generation**: GPT-4 solves tasks from multiple datasets using CoT, ReAct, and Reflexion prompting
- **Format Unification**: All successful trajectories are converted to the ReAct format (interleaved Thought/Action/Observation)
- **Supervised Fine-tuning**: Smaller models (Llama2-7B, GPT-3.5) are fine-tuned on the unified trajectory data
The fine-tuning objective:
\mathcal{L} = -\sum_{t=1}^{T} \log P_\theta(a_t | x, a_{
where a_t are the thought-action tokens and x is the task input. Diversity is controlled by mixing trajectories from K tasks and M methods:
\mathcal{D}_{\text{train}} = \bigcup_{k=1}^{K} \bigcup_{m=1}^{M} \mathcal{T}_{k,m}
===== Models and Benchmarks =====
^ Model ^ Role ^
| GPT-4 | Teacher: generates training trajectories |
| GPT-3.5 | Student: fine-tuned on trajectories |
| Llama2-7B | Student: fine-tuned on trajectories |
^ Benchmark ^ Task Type ^
| HotpotQA | Multi-hop question answering |
| StrategyQA | Strategic reasoning |
| Bamboogle | Complex QA |
| MMLU | Broad knowledge evaluation |
===== Key Results =====
* **Llama2-7B on HotpotQA**: 77% performance increase after fine-tuning with 500 GPT-4 trajectories
* **GPT-3.5 on HotpotQA**: EM score 31.4 -> 39.2 (+25% improvement)
* **GPT-3.5 on Bamboogle**: EM 40.8 -> 44.0, outperforming prompted GPT-3.5
* **Robustness**: Fine-tuned agents show only 14.2% performance drop with noisy tools vs. 33.8% for prompted agents
* Multi-task training consistently outperforms single-task training
* Fine-tuned agents require **no few-shot examples** at inference, reducing cost
===== Code Example =====
# FireAct-style trajectory generation and fine-tuning pipeline
from datasets import load_dataset
from transformers import AutoModelForCausalLM, Trainer, TrainingArguments
# Step 1: Generate trajectories with GPT-4
def generate_trajectory(task, method='react'):
prompt = build_agent_prompt(task, method)
trajectory = gpt4_agent.run(prompt, tools=['search', 'lookup'])
if trajectory.success:
return convert_to_react_format(trajectory)
return None
# Step 2: Collect multi-task, multi-method trajectories
trajectories = []
for dataset_name in ['hotpotqa', 'strategyqa', 'bamboogle']:
dataset = load_dataset(dataset_name, split='train[:500]')
for method in ['cot', 'react', 'reflexion']:
for sample in dataset:
traj = generate_trajectory(sample, method)
if traj:
trajectories.append(traj)
# Step 3: Fine-tune smaller model
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf')
trainer = Trainer(
model=model,
args=TrainingArguments(output_dir='./fireact-7b', num_train_epochs=3),
train_dataset=trajectories,
)
trainer.train()
===== See Also =====
* [[agenttuning|AgentTuning: Instruction-Tuning for Agent Abilities]]
* [[react|ReAct: Reasoning and Acting]]
* [[reflexion|Reflexion: Verbal Reinforcement Learning]]
* [[retroformer|Retroformer: Policy Gradient Agent Optimization]]
===== References =====