====== ToRA: Tool-Integrated Reasoning Agents for Mathematical Problem Solving ====== ToRA (Tool-integrated Reasoning Agents) is a series of LLM-based agents that solve complex mathematical problems by **interleaving natural language reasoning with program-based tool execution**(([[https://arxiv.org/abs/2309.17452|Gou et al. (2023) - ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving]])). Introduced by Gou et al. (2023) at ICLR 2024, ToRA achieves state-of-the-art results on mathematical benchmarks by combining the analytical clarity of chain-of-thought reasoning with the computational precision of code execution. ===== Overview ===== Pure natural language reasoning struggles with precise computation, while pure program synthesis misses high-level mathematical insight. ToRA bridges this gap through a hybrid approach: the model reasons verbally about the problem structure, writes code to perform calculations using external tools, observes results, and continues reasoning. The improvement over single-modality approaches on MATH using LLaMA-2: \Delta_{\text{ToRA}} = +29.0\%\ \text{(vs. rationale-only)},\ +6.7\%\ \text{(vs. program-only)} ===== Methodology ===== graph LR A[Math Problem] --> B[NL Reasoning Step] B --> C[Code Generation] C --> D[Tool Execution] D --> E[Observe Output] E --> F{Problem Solved?} F -->|No| B F -->|Yes| G[Final Answer] Training proceeds in three stages(([[https://openreview.net/forum?id=Ep0TtjVoap|ICLR 2024 Paper]])): - **Trajectory Curation**: Interactive tool-use trajectories are collected via prompting GPT-4 on math datasets - **Imitation Learning**: Smaller models are trained on curated trajectories - **Output Space Shaping**: Reasoning and tool interactions are refined for valid execution The training objective minimizes cross-entropy over trajectory tokens: \mathcal{L} = -\sum_{t=1}^{T} \log P_\theta(y_t | y_{ where x is the problem and y_{1:T} is the interleaved reasoning-code trajectory. ===== Key Results ===== ^ Model ^ MATH Accuracy ^ Notes ^ | ToRA-7B | 44.6% | Surpasses WizardMath-70B by +22% absolute | | ToRA-Code-34B | >50% | First open-source model above 50% on MATH | | ToRA-70B | Rivals GPT-4 | Outperforms GPT-4 CoT on MATH | Key findings across 10 mathematical reasoning benchmarks: * **13-19% absolute improvement** over prior open-source models across all datasets and model scales(([[https://microsoft.github.io/ToRA/|ToRA Project Page (Microsoft Research)]])) * Tool integration is most beneficial for computation-heavy problems (algebra, number theory) * Output space shaping further improves accuracy by ensuring syntactically valid tool calls ===== Code Example ===== # ToRA-style interleaved reasoning and code execution from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained('llhf/ToRA-7B') tokenizer = AutoTokenizer.from_pretrained('llhf/ToRA-7B') problem = 'Find all integer solutions to x^2 + y^2 = 25' # ToRA interleaves reasoning with executable code blocks prompt = ( 'Problem: ' + problem + '\n' 'Reasoning: We need integer pairs (x, y) where x^2 + y^2 = 25.\n' 'The max value of |x| or |y| is 5 since 5^2 = 25.\n' 'Let me enumerate systematically:\n\n' 'solutions = []\n' 'for x in range(-5, 6):\n' ' for y in range(-5, 6):\n' ' if x**2 + y**2 == 25:\n' ' solutions.append((x, y))\n' 'print(solutions)' ) inputs = tokenizer(prompt, return_tensors='pt') output = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(output[0], skip_special_tokens=True)) ===== See Also ===== * [[critic_self_correction|CRITIC: Self-Correction with Tool Interaction]] * [[chain_of_thought|Chain-of-Thought Reasoning]] * [[tool_use_agents|Tool-Use in LLM Agents]] ===== References =====