Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
LATM (Large Language Models as Tool Makers) is a cost-efficient agent framework introduced by Cai et al. (2023) that implements a division of labor: a strong LLM (GPT-4) creates reusable Python tools, while a weaker LLM (GPT-3.5) uses them for inference.1) With 271 citations, it demonstrates that this tool-making/tool-using paradigm achieves near-GPT-4 performance at a fraction of the cost by amortizing expensive tool creation across many lightweight invocations.
LATM draws an analogy to human technological evolution: sophisticated tools are created once by skilled craftspeople, then used repeatedly by the general population. The framework separates the cognitive burden of tool creation from tool application.
The cost model motivates the approach. For $n$ problem instances, direct GPT-4 inference costs:
$$C_{\text{direct}} = n \cdot c_{\text{GPT-4}}$$
With LATM, the amortized cost becomes:
$$C_{\text{LATM}} = k \cdot c_{\text{GPT-4}} + n \cdot c_{\text{GPT-3.5}}$$
where $k$ is the small number of demonstrations used for tool creation. Since $c_{\text{GPT-3.5}} \ll c_{\text{GPT-4}}$ and $k \ll n$, LATM achieves substantial savings.
GPT-4 receives $k$ task demonstrations (typically 3) and generates a generic, reusable Python function that solves the demonstrated problem pattern.
The proposed tool is tested on held-out validation examples. If it fails, GPT-4 iterates on the implementation until correctness is achieved.
The verified function is packaged with an API-friendly interface (docstring, type hints, usage examples) and cached in a tool repository for future use.
When a new task arrives, a lightweight dispatcher module determines whether an existing cached tool applies or if a new tool must be created. This routes tasks to the appropriate cached tool, minimizing redundant tool creation.
# Simplified LATM framework class LATM: def __init__(self, tool_maker_llm, tool_user_llm): self.maker = tool_maker_llm # GPT-4 self.user = tool_user_llm # GPT-3.5 self.tool_cache = {} def make_tool(self, demonstrations, task_type): # Phase 1: Propose prompt = ( f"Given these examples:\n{demonstrations}\n" "Write a general Python function that solves this type of problem." ) tool_code = self.maker.generate(prompt) # Phase 2: Verify on held-out examples validation_set = demonstrations[-1:] if not self._validate(tool_code, validation_set): tool_code = self._iterate_fix(tool_code, validation_set) # Phase 3: Wrap and cache wrapped_tool = self._wrap_as_api(tool_code, task_type) self.tool_cache[task_type] = wrapped_tool return wrapped_tool def dispatch(self, task): task_type = self.user.classify_task(task) if task_type not in self.tool_cache: demos = self._get_demonstrations(task_type) self.make_tool(demos, task_type) return self.tool_cache[task_type] def solve(self, task): tool = self.dispatch(task) prompt = f"Use this tool to solve the problem:\n{tool.api_doc}\n\nTask: {task}" return self.user.generate(prompt)