Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Automatic Reasoning and Tool-use (ART) is a framework that enables frozen large language models to automatically generate multi-step reasoning programs and integrate external tool outputs without requiring fine-tuning or hand-crafted task-specific demonstrations. ART automates both chain-of-thought decomposition and tool selection using a task library approach.1)
Traditional chain-of-thought prompting and tool-use approaches rely on carefully hand-crafted, task-specific demonstrations and manually scripted interleaving between model generations and tool calls.2) This manual effort limits scalability and requires expertise for each new task. ART automates this entire process while keeping the underlying LLM frozen.
ART operates in two primary phases:
Given a new task, ART selects relevant demonstrations of multi-step reasoning and tool use from a task library – a structured repository containing examples of how to decompose and solve various task types with appropriate tools.
At test time, ART dynamically generates reasoning steps as a program. When external tools are needed, the generation:
This seamless interleaving of reasoning and tool use happens automatically without manual scripting.
ART's extensibility depends on two key components:3)
The framework uses a selection mechanism to identify the most appropriate demonstrations and tools for each new task, enabling zero-shot generalization.
A distinguishing feature of ART is its use of frozen LLMs – the underlying language model requires no fine-tuning or parameter updates. This offers several advantages:
ART is designed to support human feedback. Humans can improve performance by:
This makes ART an extensible system that improves with minimal human intervention.
ART demonstrates substantial improvements across major benchmarks:4)
| Comparison | Improvement |
| ART vs. few-shot prompting (unseen tasks) | +10.8% average |
| Tool-use contribution (additional) | +12.3 percentage points |
| ART vs. hand-crafted CoT | Matches on majority of tasks |
| ART + human feedback vs. hand-crafted CoT | Exceeds performance |
Evaluated on BigBench and MMLU benchmarks, ART excels particularly on arithmetic and algorithmic reasoning tasks.
| Method | Approach | ART Advantage |
| Few-shot prompting | Fixed examples, no tools | +10.8% from automated decomposition + tools |
| Standard CoT | Manual reasoning chains | Automated, no per-task engineering |
| Hand-crafted CoT + tools | Manual scripting of tool calls | Fully automated tool selection and integration |
| PAL | Code generation for computation | Broader tool set beyond code execution |
The key advantage is that ART automates both reasoning decomposition and tool selection without requiring manual crafting for each new task.