====== LLM API Quick Reference ====== Comprehensive reference for every major LLM API provider. Copy-paste ready endpoints, pricing, and code snippets. **Last updated:** March 2026 ===== Provider Comparison Table ===== ^ Provider ^ Model ^ Endpoint URL ^ Auth Method ^ Context ^ Input $/1M ^ Output $/1M ^ | **OpenAI** | GPT-4.1 | ''%%https://api.openai.com/v1/chat/completions%%'' | Bearer token | 1M | $2.00 | $8.00 | | **OpenAI** | o3 | ''%%https://api.openai.com/v1/chat/completions%%'' | Bearer token | 200K | $2.00 | $8.00 | | **OpenAI** | o4-mini | ''%%https://api.openai.com/v1/chat/completions%%'' | Bearer token | 200K | $1.10 | $4.40 | | **OpenAI** | GPT-4o | ''%%https://api.openai.com/v1/chat/completions%%'' | Bearer token | 128K | $2.50 | $10.00 | | **Anthropic** | Claude Opus 4.6 | ''%%https://api.anthropic.com/v1/messages%%'' | x-api-key header | 200K | $5.00 | $25.00 | | **Anthropic** | Claude Sonnet 4.5 | ''%%https://api.anthropic.com/v1/messages%%'' | x-api-key header | 200K | $3.00 | $15.00 | | **Anthropic** | Claude Haiku 3.5 | ''%%https://api.anthropic.com/v1/messages%%'' | x-api-key header | 200K | $0.80 | $4.00 | | **Google** | Gemini 2.5 Pro | ''%%https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent%%'' | API key or Bearer | 1M | $1.25 | $10.00 | | **Google** | Gemini 2.5 Flash | ''%%https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent%%'' | API key or Bearer | 1M | $0.15 | $0.60 | | **Mistral** | Mistral Large | ''%%https://api.mistral.ai/v1/chat/completions%%'' | Bearer token | 128K | $2.00 | $6.00 | | **Mistral** | Mistral Small | ''%%https://api.mistral.ai/v1/chat/completions%%'' | Bearer token | 128K | $0.40 | $2.00 | | **DeepSeek** | DeepSeek-V3 | ''%%https://api.deepseek.com/v1/chat/completions%%'' | Bearer token | 164K | $0.14 | $0.28 | | **DeepSeek** | DeepSeek-R1 | ''%%https://api.deepseek.com/v1/chat/completions%%'' | Bearer token | 164K | $0.55 | $2.19 | | **Groq** | (Hosted models) | ''%%https://api.groq.com/openai/v1/chat/completions%%'' | Bearer token | Varies | ~$0.10 | ~$0.25 | | **Together** | (Hosted models) | ''%%https://api.together.xyz/v1/chat/completions%%'' | Bearer token | Varies | ~$0.20 | ~$0.88 | | **Fireworks** | (Hosted models) | ''%%https://api.fireworks.ai/inference/v1/chat/completions%%'' | Bearer token | Varies | ~$0.10 | ~$1.00 | | **Ollama** | (Local models) | ''%%http://localhost:11434/api/chat%%'' | None (local) | Varies | Free | Free | ===== Code Snippets ===== ==== OpenAI ==== import openai client = openai.OpenAI(api_key="sk-...") response = client.chat.completions.create( model="gpt-4.1", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ] ) print(response.choices[0].message.content) curl https://api.openai.com/v1/chat/completions \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"gpt-4.1","messages":[{"role":"user","content":"Hello"}]}' ==== Anthropic ==== import anthropic client = anthropic.Anthropic(api_key="sk-ant-...") message = client.messages.create( model="claude-sonnet-4-5-20250514", max_tokens=1024, messages=[ {"role": "user", "content": "Hello!"} ] ) print(message.content[0].text) curl https://api.anthropic.com/v1/messages \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "Content-Type: application/json" \ -d '{"model":"claude-sonnet-4-5-20250514","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}' ==== Google Gemini ==== from google import genai client = genai.Client(api_key="AIza...") response = client.models.generate_content( model="gemini-2.5-pro", contents="Hello!" ) print(response.text) curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent?key=$GEMINI_API_KEY" \ -H "Content-Type: application/json" \ -d '{"contents":[{"parts":[{"text":"Hello"}]}]}' ==== Mistral ==== from mistralai import Mistral client = Mistral(api_key="...") response = client.chat.complete( model="mistral-large-latest", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) ==== DeepSeek ==== import openai client = openai.OpenAI( api_key="sk-...", base_url="https://api.deepseek.com/v1" ) response = client.chat.completions.create( model="deepseek-chat", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) ==== Groq ==== import openai client = openai.OpenAI( api_key="gsk_...", base_url="https://api.groq.com/openai/v1" ) response = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) ==== Together AI ==== import openai client = openai.OpenAI( api_key="...", base_url="https://api.together.xyz/v1" ) response = client.chat.completions.create( model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) ==== Fireworks AI ==== import openai client = openai.OpenAI( api_key="fw_...", base_url="https://api.fireworks.ai/inference/v1" ) response = client.chat.completions.create( model="accounts/fireworks/models/llama-v3p1-70b-instruct", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) ==== Ollama (Local) ==== import openai client = openai.OpenAI( api_key="ollama", base_url="http://localhost:11434/v1" ) response = client.chat.completions.create( model="llama3.2", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) curl http://localhost:11434/api/chat \ -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello"}]}' ===== Key Differences ===== ^ Feature ^ OpenAI ^ Anthropic ^ Google ^ Others ^ | Auth Header | ''Authorization: Bearer'' | ''x-api-key'' | ''API key in URL or Bearer'' | ''Authorization: Bearer'' | | SDK Pattern | ''client.chat.completions.create()'' | ''client.messages.create()'' | ''client.models.generate_content()'' | OpenAI-compatible | | Streaming | ''stream=True'' | ''stream=True'' (uses SSE) | ''stream=True'' | ''stream=True'' | | Tool Calling | ''tools=[]'' param | ''tools=[]'' param | ''tools=[]'' param | Varies | | Response Path | ''choices[0].message.content'' | ''content[0].text'' | ''response.text'' | OpenAI-compatible | ===== Tips ===== * Most providers (Groq, Together, Fireworks, DeepSeek) are **OpenAI SDK compatible** -- just change ''base_url'' and ''api_key'' * Always set ''max_tokens'' for Anthropic (required) and consider it for cost control elsewhere * Use **streaming** for long responses to improve perceived latency * Cache API keys in environment variables, never hardcode them