====== ToRA: Tool-Integrated Reasoning Agents for Mathematical Problem Solving ======

ToRA (Tool-integrated Reasoning Agents) is a series of LLM-based agents that solve complex mathematical problems by **interleaving natural language reasoning with program-based tool execution**(([[https://arxiv.org/abs/2309.17452|Gou et al. (2023) - ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving]])). Introduced by Gou et al. (2023) at ICLR 2024, ToRA achieves state-of-the-art results on mathematical benchmarks by combining the analytical clarity of chain-of-thought reasoning with the computational precision of code execution.

===== Overview =====

Pure natural language reasoning struggles with precise computation, while pure program synthesis misses high-level mathematical insight. ToRA bridges this gap through a hybrid approach: the model reasons verbally about the problem structure, writes code to perform calculations using external tools, observes results, and continues reasoning.

The improvement over single-modality approaches on MATH using LLaMA-2:

<latex>\Delta_{\text{ToRA}} = +29.0\%\ \text{(vs. rationale-only)},\ +6.7\%\ \text{(vs. program-only)}</latex>

===== Methodology =====

<mermaid>
graph LR
    A[Math Problem] --> B[NL Reasoning Step]
    B --> C[Code Generation]
    C --> D[Tool Execution]
    D --> E[Observe Output]
    E --> F{Problem Solved?}
    F -->|No| B
    F -->|Yes| G[Final Answer]
</mermaid>

Training proceeds in three stages(([[https://openreview.net/forum?id=Ep0TtjVoap|ICLR 2024 Paper]])):

  - **Trajectory Curation**: Interactive tool-use trajectories are collected via prompting GPT-4 on math datasets
  - **Imitation Learning**: Smaller models are trained on curated trajectories
  - **Output Space Shaping**: Reasoning and tool interactions are refined for valid execution

The training objective minimizes cross-entropy over trajectory tokens:

<latex>\mathcal{L} = -\sum_{t=1}^{T} \log P_\theta(y_t | y_{<t}, x)</latex>

where <latex>x</latex> is the problem and <latex>y_{1:T}</latex> is the interleaved reasoning-code trajectory.

===== Key Results =====

^ Model ^ MATH Accuracy ^ Notes ^
| ToRA-7B | 44.6% | Surpasses WizardMath-70B by +22% absolute |
| ToRA-Code-34B | >50% | First open-source model above 50% on MATH |
| ToRA-70B | Rivals GPT-4 | Outperforms GPT-4 CoT on MATH |

Key findings across 10 mathematical reasoning benchmarks:
  * **13-19% absolute improvement** over prior open-source models across all datasets and model scales(([[https://microsoft.github.io/ToRA/|ToRA Project Page (Microsoft Research)]]))
  * Tool integration is most beneficial for computation-heavy problems (algebra, number theory)
  * Output space shaping further improves accuracy by ensuring syntactically valid tool calls

===== Code Example =====

<code python>
# ToRA-style interleaved reasoning and code execution
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained('llhf/ToRA-7B')
tokenizer = AutoTokenizer.from_pretrained('llhf/ToRA-7B')

problem = 'Find all integer solutions to x^2 + y^2 = 25'

# ToRA interleaves reasoning with executable code blocks
prompt = (
    'Problem: ' + problem + '\n'
    'Reasoning: We need integer pairs (x, y) where x^2 + y^2 = 25.\n'
    'The max value of |x| or |y| is 5 since 5^2 = 25.\n'
    'Let me enumerate systematically:\n\n'
    'solutions = []\n'
    'for x in range(-5, 6):\n'
    '    for y in range(-5, 6):\n'
    '        if x**2 + y**2 == 25:\n'
    '            solutions.append((x, y))\n'
    'print(solutions)'
)

inputs = tokenizer(prompt, return_tensors='pt')
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))
</code>

===== See Also =====

  * [[critic_self_correction|CRITIC: Self-Correction with Tool Interaction]]
  * [[chain_of_thought|Chain-of-Thought Reasoning]]
  * [[tool_use_agents|Tool-Use in LLM Agents]]

===== References =====