====== LLM Model Comparison ====== Current pricing and specifications for all major LLMs. Use this to pick the right model for your use case and budget. Prices are per 1M tokens via official API. ===== Model Comparison Table ===== ^ Model ^ Provider ^ Context Window ^ Input $/1M ^ Output $/1M ^ Strengths ^ API Endpoint ^ | **GPT-4o** | OpenAI | 128K | $2.50 | $10.00 | Fast multimodal frontier model, vision + audio | api.openai.com | | **GPT-4.1** | OpenAI | 1M | $2.00 | $8.00 | Long-context coding, instruction following | api.openai.com | | **o3** | OpenAI | 200K | $10.00 | $40.00 | Deep reasoning with thinking tokens, math/science | api.openai.com | | **o3-mini** | OpenAI | 200K | $1.10 | $4.40 | Budget reasoning model | api.openai.com | | **Claude Opus 4** | Anthropic | 200K | $15.00 | $75.00 | Most capable reasoning, complex analysis, agentic coding | api.anthropic.com | | **Claude Sonnet 4** | Anthropic | 200K | $3.00 | $15.00 | Best price-performance, balanced speed + quality | api.anthropic.com | | **Claude Haiku 3.5** | Anthropic | 200K | $0.80 | $4.00 | Fast + cheap, classification, extraction | api.anthropic.com | | **Gemini 2.5 Pro** | Google | 1M | $1.25 / $2.50 | $10.00 / $15.00 | Massive context, tiered pricing (<200K / >200K) | generativelanguage.googleapis.com | | **Gemini 2.5 Flash** | Google | 1M | $0.15 / $0.30 | $0.60 / $3.50 | Ultra-fast, cheapest reasoning model | generativelanguage.googleapis.com | | **Llama 4 Maverick** | Meta (via hosts) | 128K | $0.88 | $0.88 | Open-weight, strong multilingual, self-hostable | together.ai / fireworks.ai | | **Llama 4 Scout** | Meta (via hosts) | 128K | $0.11 | $0.22 | Budget open model, lightweight tasks | together.ai / fireworks.ai | | **Mistral Large** | Mistral | 128K | $2.00 | $6.00 | GDPR-compliant, strong European data handling | api.mistral.ai | | **DeepSeek V3** | DeepSeek | 128K | $0.27 | $1.10 | Extreme value, cache hits at $0.07/M input | api.deepseek.com | | **Qwen 3** | Alibaba | 128K-1M | $0.16 | $0.70 | Budget multilingual, scalable context variants | dashscope.aliyuncs.com | ===== Pricing Tiers Visualization ===== graph LR subgraph "Premium Tier ($10+ output)" A["Claude Opus 4
$75 out"] B["o3
$40 out"] end subgraph "Mid Tier ($4-15 output)" C["Claude Sonnet 4
$15 out"] D["GPT-4o
$10 out"] E["Gemini 2.5 Pro
$10 out"] F["GPT-4.1
$8 out"] G["Mistral Large
$6 out"] end subgraph "Budget Tier (under $4 output)" H["Claude Haiku 3.5
$4 out"] I["o3-mini
$4.40 out"] J["Gemini 2.5 Flash
$0.60 out"] K["DeepSeek V3
$1.10 out"] L["Llama 4 Maverick
$0.88 out"] M["Qwen 3
$0.70 out"] N["Llama 4 Scout
$0.22 out"] end style A fill:#e74c3c,color:#fff style B fill:#e74c3c,color:#fff style C fill:#e67e22,color:#fff style D fill:#e67e22,color:#fff style E fill:#e67e22,color:#fff style F fill:#e67e22,color:#fff style G fill:#e67e22,color:#fff style H fill:#2ecc71,color:#fff style I fill:#2ecc71,color:#fff style J fill:#2ecc71,color:#fff style K fill:#2ecc71,color:#fff style L fill:#2ecc71,color:#fff style M fill:#2ecc71,color:#fff style N fill:#2ecc71,color:#fff
===== Decision Guide ===== ^ Use Case ^ Recommended Model ^ Why ^ | Complex reasoning & analysis | Claude Opus 4 | Highest capability, best for multi-step reasoning | | Daily coding assistant | Claude Sonnet 4 or GPT-4.1 | Strong code quality at reasonable cost | | Long document processing | Gemini 2.5 Pro or GPT-4.1 | 1M context windows | | High-volume classification | Gemini 2.5 Flash | Cheapest per token with reasoning | | Budget-conscious production | DeepSeek V3 | $0.27/M input with caching at $0.07 | | Self-hosted / open-weight | Llama 4 Maverick | Strong open model, no API costs at scale | | Math / science reasoning | o3 | Purpose-built for deep reasoning tasks | | European data compliance | Mistral Large | GDPR-compliant, EU-hosted option | | Multilingual applications | Qwen 3 or Llama 4 | Strong multilingual benchmarks | | Fastest response time | Gemini 2.5 Flash | Sub-second latency, streaming | ===== Context Window Comparison ===== ^ Model ^ Context Window ^ Notes ^ | Gemini 2.5 Pro | 1,000,000 | Largest production context | | Gemini 2.5 Flash | 1,000,000 | Same window, lower cost | | GPT-4.1 | 1,000,000 | Newest OpenAI long-context | | Claude Opus 4 | 200,000 | Extended thinking available | | Claude Sonnet 4 | 200,000 | Same window as Opus | | o3 | 200,000 | Thinking tokens use context | | GPT-4o | 128,000 | Standard frontier context | | Llama 4 Maverick | 128,000 | Open-weight | | DeepSeek V3 | 128,000 | Budget option | | Mistral Large | 128,000 | EU-compliant | //Last updated: March 2026. Prices change frequently -- verify with provider.//