Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
ToRA (Tool-integrated Reasoning Agents) is a series of LLM-based agents that solve complex mathematical problems by interleaving natural language reasoning with program-based tool execution. Introduced by Gou et al. (2023) at ICLR 2024, ToRA achieves state-of-the-art results on mathematical benchmarks by combining the analytical clarity of chain-of-thought reasoning with the computational precision of code execution.
Pure natural language reasoning struggles with precise computation, while pure program synthesis misses high-level mathematical insight. ToRA bridges this gap through a hybrid approach: the model reasons verbally about the problem structure, writes code to perform calculations using external tools, observes results, and continues reasoning.
The improvement over single-modality approaches on MATH using LLaMA-2:
<latex>\Delta_{\text{ToRA}} = +29.0\%\ \text{(vs. rationale-only)},\ +6.7\%\ \text{(vs. program-only)}</latex>
Training proceeds in three stages:
The training objective minimizes cross-entropy over trajectory tokens:
<latex>\mathcal{L} = -\sum_{t=1}^{T} \log P_\theta(y_t | y_{<t}, x)</latex>
where <latex>x</latex> is the problem and <latex>y_{1:T}</latex> is the interleaved reasoning-code trajectory.
| Model | MATH Accuracy | Notes |
|---|---|---|
| ToRA-7B | 44.6% | Surpasses WizardMath-70B by +22% absolute |
| ToRA-Code-34B | >50% | First open-source model above 50% on MATH |
| ToRA-70B | Rivals GPT-4 | Outperforms GPT-4 CoT on MATH |
Key findings across 10 mathematical reasoning benchmarks:
# ToRA-style interleaved reasoning and code execution from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained('llhf/ToRA-7B') tokenizer = AutoTokenizer.from_pretrained('llhf/ToRA-7B') problem = 'Find all integer solutions to x^2 + y^2 = 25' # ToRA interleaves reasoning with executable code blocks prompt = ( 'Problem: ' + problem + '\n' 'Reasoning: We need integer pairs (x, y) where x^2 + y^2 = 25.\n' 'The max value of |x| or |y| is 5 since 5^2 = 25.\n' 'Let me enumerate systematically:\n\n' 'solutions = []\n' 'for x in range(-5, 6):\n' ' for y in range(-5, 6):\n' ' if x**2 + y**2 == 25:\n' ' solutions.append((x, y))\n' 'print(solutions)' ) inputs = tokenizer(prompt, return_tensors='pt') output = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(output[0], skip_special_tokens=True))