This is an old revision of the document!

ToRA: Tool-Integrated Reasoning Agents for Mathematical Problem Solving

ToRA (Tool-integrated Reasoning Agents) is a series of LLM-based agents that solve complex mathematical problems by interleaving natural language reasoning with program-based tool execution. Introduced by Gou et al. (2023) at ICLR 2024, ToRA achieves state-of-the-art results on mathematical benchmarks by combining the analytical clarity of chain-of-thought reasoning with the computational precision of code execution.

Overview

Pure natural language reasoning struggles with precise computation, while pure program synthesis misses high-level mathematical insight. ToRA bridges this gap through a hybrid approach: the model reasons verbally about the problem structure, writes code to perform calculations using external tools, observes results, and continues reasoning.

The improvement over single-modality approaches on MATH using LLaMA-2:

<latex>\Delta_{\text{ToRA}} = +29.0\%\ \text{(vs. rationale-only)},\ +6.7\%\ \text{(vs. program-only)}</latex>

Methodology

graph LR A[Math Problem] --> B[NL Reasoning Step] B --> C[Code Generation] C --> D[Tool Execution] D --> E[Observe Output] E --> F{Problem Solved?} F -->|No| B F -->|Yes| G[Final Answer]

Training proceeds in three stages:

Trajectory Curation: Interactive tool-use trajectories are collected via prompting GPT-4 on math datasets
Imitation Learning: Smaller models are trained on curated trajectories
Output Space Shaping: Reasoning and tool interactions are refined for valid execution

The training objective minimizes cross-entropy over trajectory tokens:

<latex>\mathcal{L} = -\sum_{t=1}^{T} \log P_\theta(y_t | y_{<t}, x)</latex>

where <latex>x</latex> is the problem and <latex>y_{1:T}</latex> is the interleaved reasoning-code trajectory.

Key Results

Model	MATH Accuracy	Notes
ToRA-7B	44.6%	Surpasses WizardMath-70B by +22% absolute
ToRA-Code-34B	>50%	First open-source model above 50% on MATH
ToRA-70B	Rivals GPT-4	Outperforms GPT-4 CoT on MATH

Key findings across 10 mathematical reasoning benchmarks:

13-19% absolute improvement over prior open-source models across all datasets and model scales
Tool integration is most beneficial for computation-heavy problems (algebra, number theory)
Output space shaping further improves accuracy by ensuring syntactically valid tool calls

Code Example

# ToRA-style interleaved reasoning and code execution
from transformers import AutoModelForCausalLM, AutoTokenizer
 
model = AutoModelForCausalLM.from_pretrained('llhf/ToRA-7B')
tokenizer = AutoTokenizer.from_pretrained('llhf/ToRA-7B')
 
problem = 'Find all integer solutions to x^2 + y^2 = 25'
 
# ToRA interleaves reasoning with executable code blocks
prompt = (
    'Problem: ' + problem + '\n'
    'Reasoning: We need integer pairs (x, y) where x^2 + y^2 = 25.\n'
    'The max value of |x| or |y| is 5 since 5^2 = 25.\n'
    'Let me enumerate systematically:\n\n'
    'solutions = []\n'
    'for x in range(-5, 6):\n'
    '    for y in range(-5, 6):\n'
    '        if x**2 + y**2 == 25:\n'
    '            solutions.append((x, y))\n'
    'print(solutions)'
)
 
inputs = tokenizer(prompt, return_tensors='pt')
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

AI Agent Knowledge Base

Sidebar

Table of Contents

ToRA: Tool-Integrated Reasoning Agents for Mathematical Problem Solving

Overview

Methodology

Key Results

Code Example

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

ToRA: Tool-Integrated Reasoning Agents for Mathematical Problem Solving

Overview

Methodology

Key Results

Code Example

References

See Also

Page Tools