Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
FireAct is a fine-tuning approach that enables smaller language models to perform agentic tasks at levels approaching GPT-4 by training on diverse trajectories generated by stronger models. Introduced by Chen et al. (2023), FireAct demonstrates that multi-task, multi-method trajectory data is the key to effective agent fine-tuning.1)2)3)
Prompting-based agents (ReAct, Reflexion, CoT) are limited by the base model's capacity and require expensive few-shot demonstrations at inference. FireAct shows that fine-tuning on GPT-4-generated trajectories allows 7B-parameter models to match or exceed prompted GPT-3.5 on agent tasks, with greater robustness to noisy tool outputs.
The core insight: data diversity across tasks and methods matters more than data volume.
The training process:
The fine-tuning objective:
<latex>\mathcal{L} = -\sum_{t=1}^{T} \log P_\theta(a_t | x, a_{<t})</latex>
where <latex>a_t</latex> are the thought-action tokens and <latex>x</latex> is the task input. Diversity is controlled by mixing trajectories from <latex>K</latex> tasks and <latex>M</latex> methods:
<latex>\mathcal{D}_{\text{train}} = \bigcup_{k=1}^{K} \bigcup_{m=1}^{M} \mathcal{T}_{k,m}</latex>
| Model | Role |
|---|---|
| GPT-4 | Teacher: generates training trajectories |
| GPT-3.5 | Student: fine-tuned on trajectories |
| Llama2-7B | Student: fine-tuned on trajectories |
| Benchmark | Task Type |
|---|---|
| HotpotQA | Multi-hop question answering |
| StrategyQA | Strategic reasoning |
| Bamboogle | Complex QA |
| MMLU | Broad knowledge evaluation |
# FireAct-style trajectory generation and fine-tuning pipeline from datasets import load_dataset from transformers import AutoModelForCausalLM, Trainer, TrainingArguments # Step 1: Generate trajectories with GPT-4 def generate_trajectory(task, method='react'): prompt = build_agent_prompt(task, method) trajectory = gpt4_agent.run(prompt, tools=['search', 'lookup']) if trajectory.success: return convert_to_react_format(trajectory) return None # Step 2: Collect multi-task, multi-method trajectories trajectories = [] for dataset_name in ['hotpotqa', 'strategyqa', 'bamboogle']: dataset = load_dataset(dataset_name, split='train[:500]') for method in ['cot', 'react', 'reflexion']: for sample in dataset: traj = generate_trajectory(sample, method) if traj: trajectories.append(traj) # Step 3: Fine-tune smaller model model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf') trainer = Trainer( model=model, args=TrainingArguments(output_dir='./fireact-7b', num_train_epochs=3), train_dataset=trajectories, ) trainer.train()