Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Code & Software
Safety & Security
Evaluation
Research
Development
Meta
Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Code & Software
Safety & Security
Evaluation
Research
Development
Meta
Function calling (also called tool calling) is the mechanism that enables large language models to invoke external functions, APIs, and tools in response to user requests. This capability transforms LLMs from passive text generators into action-taking agents that can query databases, call APIs, execute code, and orchestrate multi-step workflows. It is the foundational building block of all agentic AI systems.
Function calling operates as a structured loop between the LLM and external tools:
Models like GPT-4 and Claude are fine-tuned specifically to detect when functions should be called and to produce correctly formatted JSON output.
All major providers support function calling with slightly different APIs but the same core JSON Schema format:
from litellm import completion # Universal tool definition works across providers tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a location", "parameters": { "type": "object", "properties": { "city": {"type": "string", "description": "City name"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["city"] } } } ] # Same code works for OpenAI, Anthropic, Google, open-source for model in ["gpt-4", "claude-sonnet-4-20250514", "gemini/gemini-pro"]: response = completion( model=model, messages=[{"role": "user", "content": "Weather in Tokyo?"}], tools=tools, tool_choice="auto" ) tool_calls = response.choices[0].message.tool_calls if tool_calls: print(f"{model}: {tool_calls[0].function.name}({tool_calls[0].function.arguments})")
| Provider | API Field | Parallel Calls | Structured Output | Notable Feature |
| OpenAI | tools | Yes | JSON mode, strict mode | Function calling fine-tuning support |
| Anthropic | tools | Yes | Tool use with tool_result | Forced tool use with tool_choice |
| Google Gemini | tools | Yes | Function declarations | Automatic grounding with Search |
| Open Source | Varies | Model-dependent | Via grammar constraints | BFCL leaderboard rankings |
Modern LLMs can request multiple function calls simultaneously when operations are independent. This significantly improves latency for workflows requiring multiple data fetches:
# Model returns multiple tool_calls in a single response # e.g., "What's the weather in Tokyo and New York?" # Returns: [call(get_weather, city="Tokyo"), call(get_weather, city="New York")] import asyncio async def execute_parallel_calls(tool_calls, registry): tasks = [registry[call.function.name](**json.loads(call.function.arguments)) for call in tool_calls] return await asyncio.gather(*tasks)
Structured output mode guarantees the model's response conforms exactly to a provided JSON Schema. OpenAI's strict mode and Anthropic's tool use both enforce schema compliance at the token generation level, eliminating parsing failures in production.
The Berkeley Function Calling Leaderboard (BFCL) is the standard benchmark for evaluating function calling across models, testing simple calls, parallel calls, multiple functions, and relevance detection.