This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| llm_tool_makers [2026/03/25 15:25] – Create LATM page: LLMs as tool makers agent | llm_tool_makers [2026/03/30 22:18] (current) – Restructure: footnotes as references agent | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== LATM: Large Language Models as Tool Makers ====== | ====== LATM: Large Language Models as Tool Makers ====== | ||
| - | **LATM** (Large Language Models as Tool Makers) is a cost-efficient agent framework introduced by Cai et al. (2023) that implements a division of labor: a **strong LLM (GPT-4) creates reusable Python tools**, while a **weaker LLM (GPT-3.5) uses them** for inference. With **271 citations**, | + | **LATM** (Large Language Models as Tool Makers) is a cost-efficient agent framework introduced by Cai et al. (2023) that implements a division of labor: a **strong LLM (GPT-4) creates reusable Python tools**, while a **weaker LLM (GPT-3.5) uses them** for inference.(([[https:// |
| [[https:// | [[https:// | ||
| Line 7: | Line 7: | ||
| ===== Tool-Making / Tool-Using Paradigm ===== | ===== Tool-Making / Tool-Using Paradigm ===== | ||
| - | LATM draws an analogy to human technological evolution: sophisticated tools are created once by skilled craftspeople, | + | LATM draws an analogy to human technological evolution: sophisticated tools are created once by skilled craftspeople, |
| The cost model motivates the approach. For $n$ problem instances, direct GPT-4 inference costs: | The cost model motivates the approach. For $n$ problem instances, direct GPT-4 inference costs: | ||
| Line 56: | Line 56: | ||
| </ | </ | ||
| - | ===== Code Example ===== | + | ===== Code Example =====(([[https:// |
| <code python> | <code python> | ||
| Line 99: | Line 99: | ||
| ===== Key Results ===== | ===== Key Results ===== | ||
| - | * GPT-4 as tool maker + GPT-3.5 as tool user **matches GPT-4 end-to-end performance** | + | * GPT-4 as tool maker + GPT-3.5 as tool user **matches GPT-4 end-to-end performance**(([[https:// |
| * Significant cost reduction: tool-making cost is amortized across all task instances | * Significant cost reduction: tool-making cost is amortized across all task instances | ||
| * Evaluated on **Big-Bench tasks** including logical deduction (e.g., ordering objects from constraints) | * Evaluated on **Big-Bench tasks** including logical deduction (e.g., ordering objects from constraints) | ||
| * Tools generalize well across problem instances within the same task family | * Tools generalize well across problem instances within the same task family | ||
| * Tool verification ensures correctness before deployment to the weaker model | * Tool verification ensures correctness before deployment to the weaker model | ||
| - | * The paradigm extends to any strong/weak model pair | + | * The paradigm extends to any strong/weak model pair(([[https:// |
| - | + | ||
| - | ===== References ===== | + | |
| - | + | ||
| - | * [[https:// | + | |
| - | * [[https:// | + | |
| - | * [[https:// | + | |
| - | * [[https:// | + | |
| ===== See Also ===== | ===== See Also ===== | ||
| Line 119: | Line 112: | ||
| * [[reasoning_via_planning|RAP: | * [[reasoning_via_planning|RAP: | ||
| + | ===== References ===== | ||