Browse
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety
Meta
Browse
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety
Meta
API-Bank is a comprehensive benchmark introduced by Li et al., 2023 for evaluating the tool-use capabilities of large language models across a diverse set of real-world APIs. It provides both evaluation data and training data, making it one of the first complete benchmarks for tool-augmented LLM research. API-Bank addresses three key questions: how effective are current LLMs at using tools, how can tool use be improved, and what obstacles remain.1)2)3)
API-Bank tests LLMs across three levels of increasing difficulty:
Level 1 - API Calling: Can the model correctly call a single API given its documentation? Tests parameter extraction and formatting.
Level 2 - API Retrieval + Calling: Can the model identify the correct API from a set and call it properly? Tests tool selection from multiple options.
Level 3 - Planning + Retrieval + Calling: Can the model decompose a complex request into multiple API calls, retrieve the right APIs, and execute them in order? Tests multi-step tool-use reasoning.
API-Bank was among the first benchmarks to provide:
It established the methodology for subsequent tool-use benchmarks like ToolBench, MINT, and T-Eval, and informed the development of function calling APIs and self-supervised tool learning approaches.