Current pricing and specifications for all major LLMs. Use this to pick the right model for your use case and budget. Prices are per 1M tokens via official API.
| Model | Provider | Context Window | Input $/1M ^ Output $/1M | Strengths | API Endpoint | |
|---|---|---|---|---|---|---|
| GPT-4o | OpenAI | 128K | $2.50 | $10.00 | Fast multimodal frontier model, vision + audio | api.openai.com | |
| GPT-4.1 | OpenAI | 1M | $2.00 | $8.00 | Long-context coding, instruction following | api.openai.com | |
| o3 | OpenAI | 200K | $10.00 | $40.00 | Deep reasoning with thinking tokens, math/science | api.openai.com | |
| o3-mini | OpenAI | 200K | $1.10 | $4.40 | Budget reasoning model | api.openai.com | |
| Claude Opus 4 | Anthropic | 200K | $15.00 | $75.00 | Most capable reasoning, complex analysis, agentic coding | api.anthropic.com | |
| Claude Sonnet 4 | Anthropic | 200K | $3.00 | $15.00 | Best price-performance, balanced speed + quality | api.anthropic.com | |
| Claude Haiku 3.5 | Anthropic | 200K | $0.80 | $4.00 | Fast + cheap, classification, extraction | api.anthropic.com | |
| Gemini 2.5 Pro | 1M | $1.25 / $2.50 | $10.00 / $15.00 | Massive context, tiered pricing (<200K / >200K) | generativelanguage.googleapis.com | |
| Gemini 2.5 Flash | 1M | $0.15 / $0.30 | $0.60 / $3.50 | Ultra-fast, cheapest reasoning model | generativelanguage.googleapis.com | |
| Llama 4 Maverick | Meta (via hosts) | 128K | $0.88 | $0.88 | Open-weight, strong multilingual, self-hostable | together.ai / fireworks.ai | |
| Llama 4 Scout | Meta (via hosts) | 128K | $0.11 | $0.22 | Budget open model, lightweight tasks | together.ai / fireworks.ai | |
| Mistral Large | Mistral | 128K | $2.00 | $6.00 | GDPR-compliant, strong European data handling | api.mistral.ai | |
| DeepSeek V3 | DeepSeek | 128K | $0.27 | $1.10 | Extreme value, cache hits at $0.07/M input | api.deepseek.com | |
| Qwen 3 | Alibaba | 128K-1M | $0.16 | $0.70 | Budget multilingual, scalable context variants | dashscope.aliyuncs.com |
| Use Case | Recommended Model | Why |
|---|---|---|
| Complex reasoning & analysis | Claude Opus 4 | Highest capability, best for multi-step reasoning |
| Daily coding assistant | Claude Sonnet 4 or GPT-4.1 | Strong code quality at reasonable cost |
| Long document processing | Gemini 2.5 Pro or GPT-4.1 | 1M context windows |
| High-volume classification | Gemini 2.5 Flash | Cheapest per token with reasoning |
| Budget-conscious production | DeepSeek V3 | $0.27/M input with caching at $0.07 |
| Self-hosted / open-weight | Llama 4 Maverick | Strong open model, no API costs at scale |
| Math / science reasoning | o3 | Purpose-built for deep reasoning tasks |
| European data compliance | Mistral Large | GDPR-compliant, EU-hosted option |
| Multilingual applications | Qwen 3 or Llama 4 | Strong multilingual benchmarks |
| Fastest response time | Gemini 2.5 Flash | Sub-second latency, streaming |
| Model | Context Window | Notes |
|---|---|---|
| Gemini 2.5 Pro | 1,000,000 | Largest production context |
| Gemini 2.5 Flash | 1,000,000 | Same window, lower cost |
| GPT-4.1 | 1,000,000 | Newest OpenAI long-context |
| Claude Opus 4 | 200,000 | Extended thinking available |
| Claude Sonnet 4 | 200,000 | Same window as Opus |
| o3 | 200,000 | Thinking tokens use context |
| GPT-4o | 128,000 | Standard frontier context |
| Llama 4 Maverick | 128,000 | Open-weight |
| DeepSeek V3 | 128,000 | Budget option |
| Mistral Large | 128,000 | EU-compliant |
Last updated: March 2026. Prices change frequently – verify with provider.