Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Safety & Governance
Evaluation
Research
Development
Meta
Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Safety & Governance
Evaluation
Research
Development
Meta
Tool Learning with Foundation Models is a comprehensive survey by Qin et al. (2023) that formalizes how large language models can serve as intelligent controllers that leverage external tools to overcome their inherent limitations. The survey establishes a unified framework covering tool creation, selection, invocation, and evaluation, drawing on cognitive science to ground the paradigm in human tool-use evolution.
Foundation models excel at language understanding and generation but struggle with precise computation, real-time data access, and physical interaction. Tool learning addresses these gaps by positioning the LLM as an orchestrator that decomposes tasks and delegates specialized operations to external tools. This mirrors how human intelligence evolved to extend biological capabilities through tool creation and use.
The survey grounds tool learning in cognitive science, tracing tool use from early hominid stone tools (~3.3 million years ago) to modern computational tools. Key cognitive pillars that foundation models emulate include:
Foundation models replicate these capabilities through emergent abilities like in-context learning and chain-of-thought reasoning.
The framework comprises five interacting components:
The core pipeline formalizes four stages:
$$T^* = \arg\max_{T' \subseteq T} \text{Utility}(T', I)$$
Tools are categorized by their interaction modality:
| Category | Description | Examples |
|---|---|---|
| Perception | Convert raw data into structured representations | OCR, speech-to-text, image captioning |
| Action | Execute operations via APIs or commands | Web search, code interpreters, robot control |
| Computation | Perform numerical or symbolic calculations | Calculators, Wolfram Alpha, simulators |
| Data | Retrieve, store, or manage information | Databases, knowledge graphs, vector stores |
This taxonomy highlights the complementary relationship: tools handle precise low-level operations while models manage high-level orchestration.
import json import openai import requests # Define available tools with schemas TOOLS = [ { "type": "function", "function": { "name": "calculator", "description": "Evaluate a mathematical expression", "parameters": { "type": "object", "properties": { "expression": {"type": "string", "description": "Math expression to evaluate"} }, "required": ["expression"] } } }, { "type": "function", "function": { "name": "web_search", "description": "Search the web for current information", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"] } } } ] def execute_tool(name, args): if name == "calculator": return str(eval(args["expression"])) # simplified elif name == "web_search": return requests.get( "https://api.search.example/v1/search", params={"q": args["query"]} ).json()["results"][0]["snippet"] def tool_augmented_generation(query, client): messages = [{"role": "user", "content": query}] while True: response = client.chat.completions.create( model="gpt-4", messages=messages, tools=TOOLS, tool_choice="auto" ) msg = response.choices[0].message messages.append(msg) if not msg.tool_calls: return msg.content for call in msg.tool_calls: result = execute_tool(call.function.name, json.loads(call.function.arguments)) messages.append({ "role": "tool", "tool_call_id": call.id, "content": result })
Foundation models are effective controllers due to several complementary strengths:
Key benefits of the tool-augmented approach include interpretability (tool calls expose reasoning), robustness (verifiable API outputs reduce hallucination), and efficiency (offloading compute-intensive sub-tasks).
The survey evaluates 18 representative tools across the taxonomy: