Table of Contents

Toolformer

Toolformer is a research approach introduced by1) (Meta AI) in February 2023 that trains language models to autonomously decide when and how to call external tools by generating API calls inline within text sequences. The model learns tool usage in a self-supervised manner, requiring only a handful of demonstrations per API, no explicit human annotations of when tools should be used. Toolformer demonstrated that smaller models augmented with tools can match or exceed the performance of much larger models.

graph LR T[Text Dataset] --> S[Sample API Calls] S --> X[Execute Calls] X --> F[Filter by <a href='/perplexity_ai' class='wikilink2' title='perplexity_ai' rel='nofollow' data-wiki-id='perplexity_ai'>Perplexity]</a> F --> FT[Fine-Tune Model] FT --> I[Inference with Tools] style T fill:#69f,stroke:#333 style I fill:#6f6,stroke:#333

Self-Supervised Training Approach

Toolformer's key innovation is its training methodology:

  1. API Call Sampling: Given a dataset of text, the model samples potential positions where API calls could be inserted, generating candidate calls with appropriate arguments
  2. Execution: Each candidate API call is actually executed against the real tool
  3. Filtering: Only API calls that reduce perplexity on subsequent tokens are retained, meaning only calls that genuinely help predict future text survive
  4. Fine-tuning: The model is fine-tuned on the filtered dataset, learning to generate API call tokens naturally within text sequences

This approach means the model learns when a tool is useful (not just how to use it), without requiring human-labeled training data specifying tool usage points.

Perplexity Filtering Formula

The core filtering criterion compares the perplexity of tokens following a potential API call position, with and without the tool result. A candidate API call $c$ with result $r$ at position $i$ in text $x$ is retained if:

$$L_i(c, r) - L_i(\emptyset) \geq \tau$$

where $L_i$ denotes the cross-entropy loss over subsequent tokens:

$$L_i(c, r) = -\sum_{j=i}^{n} \log p_\theta(x_j \mid x_{<i}, c, r, x_{i:j-1})$$

and $L_i(\emptyset)$ is the loss without any API call. The threshold $\tau$ controls how much a tool call must help to be kept. Only calls where the tool result sufficiently reduces the loss on future tokens are included in training, ensuring the model learns to invoke tools precisely when they provide useful information.

Python Example: Inline API Call Generation

import re
from [[openai|openai]] import [[openai|OpenAI]]
 
client = [[openai|OpenAI]]()
 
# Simulate the Toolformer pattern: model generates text with inline API calls
# API calls appear as [ToolName(args) -> result] tokens in the output
 
TOOL_IMPLEMENTATIONS = {
    "Calculator": lambda expr: str(eval(expr)),
    "Search": lambda query: f"Python 3.12 was released on October 2, 2023.",
    "Calendar": lambda: "Today is 2025-03-24, Monday.",
}
 
def execute_inline_calls(text: str) -> str:
    """Parse and execute Toolformer-style inline API calls in generated text."""
    pattern = r"\[(\w+)\(([[^]]))*)\)\]"
    def replacer(match):
        tool_name, args = match.group(1), match.group(2)
        if tool_name in TOOL_IMPLEMENTATIONS:
            func = TOOL_IMPLEMENTATIONStool_name
            result = func(args) if args else func()
            return f"[{tool_name}({args}) -> {result}]"
        return match.group(0)
    return re.sub(pattern, replacer, text)
 
def toolformer_generate(prompt: str) -> str:
    """Generate text that may contain inline tool calls, then execute them."""
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "system", "content": (
            "You can insert tool calls inline using this syntax: [ToolName(args)]\n"
            "Available tools: [Calculator(expr)], [Search(query)], [Calendar()]\n"
            "Insert them naturally where they help answer accurately."
        )}, {"role": "user", "content": prompt}],
    )
    raw_output = resp.choices[0].message.content
    print(f"Raw: {raw_output}")
    # Execute any inline API calls and substitute results
    resolved = execute_inline_calls(raw_output)
    print(f"Resolved: {resolved}")
    return resolved
 
toolformer_generate("What is 347 * 23, and when was the latest Python released?")

Tools Incorporated

Toolformer demonstrated integration with five types of tools:

API calls are represented as special tokens within the text sequence: [Calculator(3+5) → 8], allowing the model to seamlessly interleave tool use with generation.

Key Results

Influence on Later Work

Toolformer established several principles that shaped subsequent tool-augmented AI research:

Limitations

See Also

References