====== FireAct: Toward Language Agent Fine-tuning ======

FireAct is a fine-tuning approach that enables **smaller language models to perform agentic tasks at levels approaching GPT-4** by training on diverse trajectories generated by stronger models. Introduced by Chen et al. (2023), FireAct demonstrates that multi-task, multi-method trajectory data is the key to effective agent fine-tuning.(([[https://arxiv.org/abs/2310.05915|Chen et al. (2023) - FireAct: Toward Language Agent Fine-tuning]]))(([[https://princeton-nlp.github.io/fireact/|FireAct Project Page (Princeton NLP)]]))(([[https://fireact-agent.github.io|FireAct Demo and Resources]]))

===== Overview =====

Prompting-based agents (ReAct, Reflexion, CoT) are limited by the base model's capacity and require expensive few-shot demonstrations at inference. FireAct shows that fine-tuning on GPT-4-generated trajectories allows 7B-parameter models to match or exceed prompted GPT-3.5 on agent tasks, with greater robustness to noisy tool outputs.

The core insight: **data diversity across tasks and methods matters more than data volume**.

===== Methodology =====

<mermaid>
graph TD
    A[Multiple Task Datasets] --> B[GPT-4 Trajectory Generation]
    B --> C1[CoT Trajectories]
    B --> C2[ReAct Trajectories]
    B --> C3[Reflexion Trajectories]
    C1 --> D[Convert to ReAct Format]
    C2 --> D
    C3 --> D
    D --> E[Filter Successful Trajectories]
    E --> F[Fine-tune Smaller Model]
    F --> G[Agent LM - No Few-shot Needed]
</mermaid>

The training process:

  - **Trajectory Generation**: GPT-4 solves tasks from multiple datasets using CoT, ReAct, and Reflexion prompting
  - **Format Unification**: All successful trajectories are converted to the ReAct format (interleaved Thought/Action/Observation)
  - **Supervised Fine-tuning**: Smaller models (Llama2-7B, GPT-3.5) are fine-tuned on the unified trajectory data

The fine-tuning objective:

<latex>\mathcal{L} = -\sum_{t=1}^{T} \log P_\theta(a_t | x, a_{<t})</latex>

where <latex>a_t</latex> are the thought-action tokens and <latex>x</latex> is the task input. Diversity is controlled by mixing trajectories from <latex>K</latex> tasks and <latex>M</latex> methods:

<latex>\mathcal{D}_{\text{train}} = \bigcup_{k=1}^{K} \bigcup_{m=1}^{M} \mathcal{T}_{k,m}</latex>

===== Models and Benchmarks =====

^ Model ^ Role ^
| GPT-4 | Teacher: generates training trajectories |
| GPT-3.5 | Student: fine-tuned on trajectories |
| Llama2-7B | Student: fine-tuned on trajectories |

^ Benchmark ^ Task Type ^
| HotpotQA | Multi-hop question answering |
| StrategyQA | Strategic reasoning |
| Bamboogle | Complex QA |
| MMLU | Broad knowledge evaluation |

===== Key Results =====

  * **Llama2-7B on HotpotQA**: 77% performance increase after fine-tuning with 500 GPT-4 trajectories
  * **GPT-3.5 on HotpotQA**: EM score 31.4 -> 39.2 (+25% improvement)
  * **GPT-3.5 on Bamboogle**: EM 40.8 -> 44.0, outperforming prompted GPT-3.5
  * **Robustness**: Fine-tuned agents show only 14.2% performance drop with noisy tools vs. 33.8% for prompted agents
  * Multi-task training consistently outperforms single-task training
  * Fine-tuned agents require **no few-shot examples** at inference, reducing cost

===== Code Example =====

<code python>
# FireAct-style trajectory generation and fine-tuning pipeline
from datasets import load_dataset
from transformers import AutoModelForCausalLM, Trainer, TrainingArguments

# Step 1: Generate trajectories with GPT-4
def generate_trajectory(task, method='react'):
    prompt = build_agent_prompt(task, method)
    trajectory = gpt4_agent.run(prompt, tools=['search', 'lookup'])
    if trajectory.success:
        return convert_to_react_format(trajectory)
    return None

# Step 2: Collect multi-task, multi-method trajectories
trajectories = []
for dataset_name in ['hotpotqa', 'strategyqa', 'bamboogle']:
    dataset = load_dataset(dataset_name, split='train[:500]')
    for method in ['cot', 'react', 'reflexion']:
        for sample in dataset:
            traj = generate_trajectory(sample, method)
            if traj:
                trajectories.append(traj)

# Step 3: Fine-tune smaller model
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf')
trainer = Trainer(
    model=model,
    args=TrainingArguments(output_dir='./fireact-7b', num_train_epochs=3),
    train_dataset=trajectories,
)
trainer.train()
</code>

===== See Also =====

  * [[agenttuning|AgentTuning: Instruction-Tuning for Agent Abilities]]
  * [[react|ReAct: Reasoning and Acting]]
  * [[reflexion|Reflexion: Verbal Reinforcement Learning]]
  * [[retroformer|Retroformer: Policy Gradient Agent Optimization]]

===== References =====