====== Fine-Tuning Agents ====== Fine-tuning LLMs for agent tasks involves training models on domain-specific data to improve their reliability at tool calling, instruction following, and structured reasoning. While [[prompt_engineering|prompt engineering]] and RAG handle many use cases, fine-tuning becomes essential when agents need consistent behavior on specialized tasks, structured output compliance, or optimized performance at reduced model sizes and costs.(([[https://simplismart.ai/blog/fine-tuning-llms-in-2025-when-it-makes-sense-and-how-to-do-it-efficiently|SimpliSmart - Fine-Tuning LLMs in 2025]]))(([[https://towardsai.net/p/data-science/fine-tuning-llms-in-2025-techniques-trade-offs-and-use-cases|Towards AI - Fine-Tuning Techniques and Trade-offs]])) ===== When to Fine-Tune vs. Prompt Engineer ===== | **Scenario** | **Recommended Approach** | **Rationale** | | Rapid prototyping | [[prompt_engineering|Prompt engineering]] | Fast iteration, no training infrastructure needed | | General-purpose agent | [[prompt_engineering|Prompt engineering]] + RAG | Flexible, leverages base model capabilities | | Consistent [[structured_outputs|structured outputs]] | Fine-tuning | Guarantees format compliance at inference time | | Domain-specific tool calling | Fine-tuning | Improves reliability of function signatures and arguments | | Reducing model size/cost | Fine-tuning smaller model | Distill capabilities from large model to small model | | Improving instruction following | Fine-tuning | Aligns model behavior with specific operational rules | | Adapting to proprietary data | Fine-tuning + RAG | Combines learned patterns with retrieved context | **Rule of thumb:** Start with [[prompt_engineering|prompt engineering]]. If evaluation shows consistent failures on specific behaviors after prompt optimization, fine-tune. ===== Fine-Tuning Techniques ===== ==== Supervised Fine-Tuning (SFT) ==== Train on curated (prompt, completion) pairs that demonstrate desired agent behavior. For tool-use agents, this includes examples of correct function calls, argument formatting, and multi-step reasoning chains. ==== LoRA and QLoRA ==== **LoRA** (Low-Rank Adaptation) inserts small trainable matrices into frozen model layers, reducing compute by 10-100x while maintaining performance. **QLoRA** (Quantized LoRA) adds 4-bit quantization, enabling billion-parameter model fine-tuning on consumer GPUs. from peft import LoraConfig, get_peft_model, TaskType from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments from trl import SFTTrainer # Load base model model = AutoModelForCausalLM.from_pretrained( "[[meta|meta]]-llama/Llama-3-8B-Instruct", load_in_4bit=True # QLoRA quantization ) tokenizer = AutoTokenizer.from_pretrained("[[meta|meta]]-llama/Llama-3-8B-Instruct") # Configure LoRA adapters lora_config = LoraConfig( task_type=TaskType.CAUSAL_LM, r=16, # Rank of low-rank matrices lora_alpha=32, # Scaling factor lora_dropout=0.05, target_modules=["q_proj", "v_proj", "k_proj", "o_proj"] ) model = get_peft_model(model, lora_config) # Train on tool-calling dataset trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=tool_calling_dataset, # (prompt, tool_call) pairs args=TrainingArguments( output_dir="./agent-lora", num_train_epochs=3, per_device_train_batch_size=4, learning_rate=2e-4, warmup_steps=100 ) ) trainer.train() ==== RLHF (Reinforcement Learning from Human Feedback) ==== Aligns agent behavior with human preferences through three phases: - **Collect comparisons** — Humans rank agent outputs for the same input - **Train reward model** — A model learns to score outputs based on human preferences - **Optimize with PPO** — The agent is trained via [[reinforcement_learning|reinforcement learning]] to maximize the reward model's score RLHF produces safer, more helpful agents but requires significant human annotation effort. ==== DPO (Direct Preference Optimization) ==== Simplifies RLHF by directly optimizing on preference pairs without training a separate reward model. DPO is more stable and computationally efficient, making it practical for smaller teams fine-tuning agent behavior. ===== Datasets for Tool-Use Fine-Tuning ===== Effective fine-tuning for [[function_calling|function calling]] requires curated datasets: * **Function call pairs** — (user_query, correct_tool_call_with_arguments) examples demonstrating proper invocation * **Multi-step traces** — Complete agent trajectories showing planning, tool calls, and synthesis * **Error recovery examples** — Demonstrations of handling failed tool calls gracefully * **Negative examples** — Cases where no tool should be called, teaching the model restraint Public datasets include [[https://gorilla.cs.berkeley.edu/|Gorilla APIBench]] for API calling and [[https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k|xLAM Function Calling]] for structured tool use.(([[https://gorilla.cs.berkeley.edu/|Gorilla - LLM API Calling Benchmark]])) ===== Evaluation ===== * **Loss convergence** — Monitor training and validation loss for overfitting * **[[function_calling|Function calling]] accuracy** — Percentage of correct tool selections and argument formatting * **BFCL benchmark** — Berkeley [[function_calling|Function Calling]] Leaderboard scores before and after fine-tuning * **Task completion rate** — End-to-end success on representative agent tasks * **Regression testing** — Ensure fine-tuning doesn't degrade general capabilities ===== See Also ===== * [[agenttuning|AgentTuning: Enabling Generalized Agent Capabilities in LLMs]] * [[tool_use|Tool Use for LLM Agents]] * [[how_to_fine_tune_an_llm|How to Fine-Tune an LLM]] * [[agentic_skills|Agentic Skills]] * [[agentbench|AgentBench]] ===== References =====