====== LLM API Quick Reference ======
Comprehensive reference for every major LLM API provider. Copy-paste ready endpoints, pricing, and code snippets.
**Last updated:** March 2026
===== Provider Comparison Table =====
^ Provider ^ Model ^ Endpoint URL ^ Auth Method ^ Context ^ Input $/1M ^ Output $/1M ^
| **OpenAI** | GPT-4.1 | ''%%https://api.openai.com/v1/chat/completions%%'' | Bearer token | 1M | $2.00 | $8.00 |
| **OpenAI** | o3 | ''%%https://api.openai.com/v1/chat/completions%%'' | Bearer token | 200K | $2.00 | $8.00 |
| **OpenAI** | o4-mini | ''%%https://api.openai.com/v1/chat/completions%%'' | Bearer token | 200K | $1.10 | $4.40 |
| **OpenAI** | GPT-4o | ''%%https://api.openai.com/v1/chat/completions%%'' | Bearer token | 128K | $2.50 | $10.00 |
| **Anthropic** | Claude Opus 4.6 | ''%%https://api.anthropic.com/v1/messages%%'' | x-api-key header | 200K | $5.00 | $25.00 |
| **Anthropic** | Claude Sonnet 4.5 | ''%%https://api.anthropic.com/v1/messages%%'' | x-api-key header | 200K | $3.00 | $15.00 |
| **Anthropic** | Claude Haiku 3.5 | ''%%https://api.anthropic.com/v1/messages%%'' | x-api-key header | 200K | $0.80 | $4.00 |
| **Google** | Gemini 2.5 Pro | ''%%https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent%%'' | API key or Bearer | 1M | $1.25 | $10.00 |
| **Google** | Gemini 2.5 Flash | ''%%https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent%%'' | API key or Bearer | 1M | $0.15 | $0.60 |
| **Mistral** | Mistral Large | ''%%https://api.mistral.ai/v1/chat/completions%%'' | Bearer token | 128K | $2.00 | $6.00 |
| **Mistral** | Mistral Small | ''%%https://api.mistral.ai/v1/chat/completions%%'' | Bearer token | 128K | $0.40 | $2.00 |
| **DeepSeek** | DeepSeek-V3 | ''%%https://api.deepseek.com/v1/chat/completions%%'' | Bearer token | 164K | $0.14 | $0.28 |
| **DeepSeek** | DeepSeek-R1 | ''%%https://api.deepseek.com/v1/chat/completions%%'' | Bearer token | 164K | $0.55 | $2.19 |
| **Groq** | (Hosted models) | ''%%https://api.groq.com/openai/v1/chat/completions%%'' | Bearer token | Varies | ~$0.10 | ~$0.25 |
| **Together** | (Hosted models) | ''%%https://api.together.xyz/v1/chat/completions%%'' | Bearer token | Varies | ~$0.20 | ~$0.88 |
| **Fireworks** | (Hosted models) | ''%%https://api.fireworks.ai/inference/v1/chat/completions%%'' | Bearer token | Varies | ~$0.10 | ~$1.00 |
| **Ollama** | (Local models) | ''%%http://localhost:11434/api/chat%%'' | None (local) | Varies | Free | Free |
===== Code Snippets =====
==== OpenAI ====
import openai
client = openai.OpenAI(api_key="sk-...")
response = client.chat.completions.create(
model="gpt-4.1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
curl https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4.1","messages":[{"role":"user","content":"Hello"}]}'
==== Anthropic ====
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")
message = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(message.content[0].text)
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4-5-20250514","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'
==== Google Gemini ====
from google import genai
client = genai.Client(api_key="AIza...")
response = client.models.generate_content(
model="gemini-2.5-pro",
contents="Hello!"
)
print(response.text)
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent?key=$GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"contents":[{"parts":[{"text":"Hello"}]}]}'
==== Mistral ====
from mistralai import Mistral
client = Mistral(api_key="...")
response = client.chat.complete(
model="mistral-large-latest",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
==== DeepSeek ====
import openai
client = openai.OpenAI(
api_key="sk-...",
base_url="https://api.deepseek.com/v1"
)
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
==== Groq ====
import openai
client = openai.OpenAI(
api_key="gsk_...",
base_url="https://api.groq.com/openai/v1"
)
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
==== Together AI ====
import openai
client = openai.OpenAI(
api_key="...",
base_url="https://api.together.xyz/v1"
)
response = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
==== Fireworks AI ====
import openai
client = openai.OpenAI(
api_key="fw_...",
base_url="https://api.fireworks.ai/inference/v1"
)
response = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p1-70b-instruct",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
==== Ollama (Local) ====
import openai
client = openai.OpenAI(
api_key="ollama",
base_url="http://localhost:11434/v1"
)
response = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
curl http://localhost:11434/api/chat \
-d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello"}]}'
===== Key Differences =====
^ Feature ^ OpenAI ^ Anthropic ^ Google ^ Others ^
| Auth Header | ''Authorization: Bearer'' | ''x-api-key'' | ''API key in URL or Bearer'' | ''Authorization: Bearer'' |
| SDK Pattern | ''client.chat.completions.create()'' | ''client.messages.create()'' | ''client.models.generate_content()'' | OpenAI-compatible |
| Streaming | ''stream=True'' | ''stream=True'' (uses SSE) | ''stream=True'' | ''stream=True'' |
| Tool Calling | ''tools=[]'' param | ''tools=[]'' param | ''tools=[]'' param | Varies |
| Response Path | ''choices[0].message.content'' | ''content[0].text'' | ''response.text'' | OpenAI-compatible |
===== Tips =====
* Most providers (Groq, Together, Fireworks, DeepSeek) are **OpenAI SDK compatible** -- just change ''base_url'' and ''api_key''
* Always set ''max_tokens'' for Anthropic (required) and consider it for cost control elsewhere
* Use **streaming** for long responses to improve perceived latency
* Cache API keys in environment variables, never hardcode them