Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Safety & Governance
Evaluation
Research
Development
Meta
Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Safety & Governance
Evaluation
Research
Development
Meta
Gorilla is a large language model by Patil et al. (2023) specifically trained for accurate API calling through retriever-aware training. By conditioning the model on retrieved API documentation at both training and inference time, Gorilla surpasses GPT-4 on API call accuracy while dramatically reducing hallucinations — the fabrication of non-existent API endpoints or incorrect parameters. The accompanying APIBench benchmark provides a standardized evaluation for API-calling capabilities.
Gorilla's core innovation is integrating a document retriever directly into the training pipeline rather than relying on the model's parametric memory:
[Query] [Retrieved Docs] → API CallThis approach grounds the model's API calls in actual documentation, preventing the hallucination of non-existent endpoints that plagues general-purpose LLMs.
APIBench is a comprehensive benchmark for evaluating end-to-end API calling:
# Gorilla-style API call generation with retrieval import torch from transformers import AutoTokenizer, AutoModelForCausalLM # Load Gorilla model (fine-tuned LLaMA) model = AutoModelForCausalLM.from_pretrained("gorilla-llm/gorilla-7b-hf-v1") tokenizer = AutoTokenizer.from_pretrained("gorilla-llm/gorilla-7b-hf-v1") def generate_api_call(query: str, retrieved_doc: str) -> str: """Generate a grounded API call from query + retrieved documentation.""" prompt = ( f"### User Query: {query}\n" f"### Retrieved API Documentation:\n{retrieved_doc}\n" f"### API Call:\n" ) inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) return tokenizer.decode(outputs[0], skip_special_tokens=True) # Example: grounded API call generation query = "Get the current weather forecast for San Francisco" doc = "GET /forecast?city={city}&units={units} - Returns weather data" api_call = generate_api_call(query, doc) # Output: {"api_call": "GET /forecast?city=San Francisco&units=metric"}
| Model | Seen APIs | Unseen APIs | Hallucination Rate |
|---|---|---|---|
| Gorilla-7B | 94.2% | 87.5% | ~5% |
| GPT-4 (zero-shot) | ~75% | ~70% | ~20% |
| LLaMA-7B (vanilla) | ~45% | ~35% | ~40% |
Gorilla reduces API hallucinations by 85-90% compared to general-purpose LLMs by grounding generation in retrieved documentation.
A critical advantage of Gorilla's architecture is version-agnostic inference:
The retriever-aware training objective combines language modeling loss with retrieval relevance:
<latex>\mathcal{L} = \mathcal{L}_{LM}(y | x, d^*) + \lambda \mathcal{L}_{ret}(d^* | x)</latex>
where <latex>x</latex> is the user query, <latex>d^*</latex> is the retrieved documentation, <latex>y</latex> is the target API call, and <latex>\lambda</latex> balances the retrieval and generation losses.
| Component | Details |
|---|---|
| Base Model | LLaMA-7B (decoder-only transformer) |
| Retriever | Contriever bi-encoder, 768-dim embeddings |
| Input Format | <Query> <Docs> Assistant: {“api_call”: …} |
| Context Length | 4K tokens |
| Training Data | ~10K query-doc pairs + 162K API documents |