====== ToRA: Tool-Integrated Reasoning Agents for Mathematical Problem Solving ======
ToRA (Tool-integrated Reasoning Agents) is a series of LLM-based agents that solve complex mathematical problems by **interleaving natural language reasoning with program-based tool execution**(([[https://arxiv.org/abs/2309.17452|Gou et al. (2023) - ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving]])). Introduced by Gou et al. (2023) at ICLR 2024, ToRA achieves state-of-the-art results on mathematical benchmarks by combining the analytical clarity of chain-of-thought reasoning with the computational precision of code execution.
===== Overview =====
Pure natural language reasoning struggles with precise computation, while pure program synthesis misses high-level mathematical insight. ToRA bridges this gap through a hybrid approach: the model reasons verbally about the problem structure, writes code to perform calculations using external tools, observes results, and continues reasoning.
The improvement over single-modality approaches on MATH using LLaMA-2:
\Delta_{\text{ToRA}} = +29.0\%\ \text{(vs. rationale-only)},\ +6.7\%\ \text{(vs. program-only)}
===== Methodology =====
graph LR
A[Math Problem] --> B[NL Reasoning Step]
B --> C[Code Generation]
C --> D[Tool Execution]
D --> E[Observe Output]
E --> F{Problem Solved?}
F -->|No| B
F -->|Yes| G[Final Answer]
Training proceeds in three stages(([[https://openreview.net/forum?id=Ep0TtjVoap|ICLR 2024 Paper]])):
- **Trajectory Curation**: Interactive tool-use trajectories are collected via prompting GPT-4 on math datasets
- **Imitation Learning**: Smaller models are trained on curated trajectories
- **Output Space Shaping**: Reasoning and tool interactions are refined for valid execution
The training objective minimizes cross-entropy over trajectory tokens:
\mathcal{L} = -\sum_{t=1}^{T} \log P_\theta(y_t | y_{
where x is the problem and y_{1:T} is the interleaved reasoning-code trajectory.
===== Key Results =====
^ Model ^ MATH Accuracy ^ Notes ^
| ToRA-7B | 44.6% | Surpasses WizardMath-70B by +22% absolute |
| ToRA-Code-34B | >50% | First open-source model above 50% on MATH |
| ToRA-70B | Rivals GPT-4 | Outperforms GPT-4 CoT on MATH |
Key findings across 10 mathematical reasoning benchmarks:
* **13-19% absolute improvement** over prior open-source models across all datasets and model scales(([[https://microsoft.github.io/ToRA/|ToRA Project Page (Microsoft Research)]]))
* Tool integration is most beneficial for computation-heavy problems (algebra, number theory)
* Output space shaping further improves accuracy by ensuring syntactically valid tool calls
===== Code Example =====
# ToRA-style interleaved reasoning and code execution
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('llhf/ToRA-7B')
tokenizer = AutoTokenizer.from_pretrained('llhf/ToRA-7B')
problem = 'Find all integer solutions to x^2 + y^2 = 25'
# ToRA interleaves reasoning with executable code blocks
prompt = (
'Problem: ' + problem + '\n'
'Reasoning: We need integer pairs (x, y) where x^2 + y^2 = 25.\n'
'The max value of |x| or |y| is 5 since 5^2 = 25.\n'
'Let me enumerate systematically:\n\n'
'solutions = []\n'
'for x in range(-5, 6):\n'
' for y in range(-5, 6):\n'
' if x**2 + y**2 == 25:\n'
' solutions.append((x, y))\n'
'print(solutions)'
)
inputs = tokenizer(prompt, return_tensors='pt')
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))
===== See Also =====
* [[critic_self_correction|CRITIC: Self-Correction with Tool Interaction]]
* [[chain_of_thought|Chain-of-Thought Reasoning]]
* [[tool_use_agents|Tool-Use in LLM Agents]]
===== References =====