====== LLM Model Comparison ======

Current pricing and specifications for all major LLMs. Use this to pick the right model for your use case and budget. Prices are per 1M tokens via official API.

===== Model Comparison Table =====

^ Model ^ Provider ^ Context Window ^ Input $/1M ^ Output $/1M ^ Strengths ^ API Endpoint ^
| **GPT-4o** | OpenAI | 128K | $2.50 | $10.00 | Fast multimodal frontier model, vision + audio | api.openai.com |
| **GPT-4.1** | OpenAI | 1M | $2.00 | $8.00 | Long-context coding, instruction following | api.openai.com |
| **o3** | OpenAI | 200K | $10.00 | $40.00 | Deep reasoning with thinking tokens, math/science | api.openai.com |
| **o3-mini** | OpenAI | 200K | $1.10 | $4.40 | Budget reasoning model | api.openai.com |
| **Claude Opus 4** | Anthropic | 200K | $15.00 | $75.00 | Most capable reasoning, complex analysis, agentic coding | api.anthropic.com |
| **Claude Sonnet 4** | Anthropic | 200K | $3.00 | $15.00 | Best price-performance, balanced speed + quality | api.anthropic.com |
| **Claude Haiku 3.5** | Anthropic | 200K | $0.80 | $4.00 | Fast + cheap, classification, extraction | api.anthropic.com |
| **Gemini 2.5 Pro** | Google | 1M | $1.25 / $2.50 | $10.00 / $15.00 | Massive context, tiered pricing (<200K / >200K) | generativelanguage.googleapis.com |
| **Gemini 2.5 Flash** | Google | 1M | $0.15 / $0.30 | $0.60 / $3.50 | Ultra-fast, cheapest reasoning model | generativelanguage.googleapis.com |
| **Llama 4 Maverick** | Meta (via hosts) | 128K | $0.88 | $0.88 | Open-weight, strong multilingual, self-hostable | together.ai / fireworks.ai |
| **Llama 4 Scout** | Meta (via hosts) | 128K | $0.11 | $0.22 | Budget open model, lightweight tasks | together.ai / fireworks.ai |
| **Mistral Large** | Mistral | 128K | $2.00 | $6.00 | GDPR-compliant, strong European data handling | api.mistral.ai |
| **DeepSeek V3** | DeepSeek | 128K | $0.27 | $1.10 | Extreme value, cache hits at $0.07/M input | api.deepseek.com |
| **Qwen 3** | Alibaba | 128K-1M | $0.16 | $0.70 | Budget multilingual, scalable context variants | dashscope.aliyuncs.com |

===== Pricing Tiers Visualization =====

<mermaid>
graph LR
    subgraph "Premium Tier ($10+ output)"
        A["Claude Opus 4<br>$75 out"]
        B["o3<br>$40 out"]
    end
    
    subgraph "Mid Tier ($4-15 output)"
        C["Claude Sonnet 4<br>$15 out"]
        D["GPT-4o<br>$10 out"]
        E["Gemini 2.5 Pro<br>$10 out"]
        F["GPT-4.1<br>$8 out"]
        G["Mistral Large<br>$6 out"]
    end
    
    subgraph "Budget Tier (under $4 output)"
        H["Claude Haiku 3.5<br>$4 out"]
        I["o3-mini<br>$4.40 out"]
        J["Gemini 2.5 Flash<br>$0.60 out"]
        K["DeepSeek V3<br>$1.10 out"]
        L["Llama 4 Maverick<br>$0.88 out"]
        M["Qwen 3<br>$0.70 out"]
        N["Llama 4 Scout<br>$0.22 out"]
    end

    style A fill:#e74c3c,color:#fff
    style B fill:#e74c3c,color:#fff
    style C fill:#e67e22,color:#fff
    style D fill:#e67e22,color:#fff
    style E fill:#e67e22,color:#fff
    style F fill:#e67e22,color:#fff
    style G fill:#e67e22,color:#fff
    style H fill:#2ecc71,color:#fff
    style I fill:#2ecc71,color:#fff
    style J fill:#2ecc71,color:#fff
    style K fill:#2ecc71,color:#fff
    style L fill:#2ecc71,color:#fff
    style M fill:#2ecc71,color:#fff
    style N fill:#2ecc71,color:#fff
</mermaid>

===== Decision Guide =====

^ Use Case ^ Recommended Model ^ Why ^
| Complex reasoning & analysis | Claude Opus 4 | Highest capability, best for multi-step reasoning |
| Daily coding assistant | Claude Sonnet 4 or GPT-4.1 | Strong code quality at reasonable cost |
| Long document processing | Gemini 2.5 Pro or GPT-4.1 | 1M context windows |
| High-volume classification | Gemini 2.5 Flash | Cheapest per token with reasoning |
| Budget-conscious production | DeepSeek V3 | $0.27/M input with caching at $0.07 |
| Self-hosted / open-weight | Llama 4 Maverick | Strong open model, no API costs at scale |
| Math / science reasoning | o3 | Purpose-built for deep reasoning tasks |
| European data compliance | Mistral Large | GDPR-compliant, EU-hosted option |
| Multilingual applications | Qwen 3 or Llama 4 | Strong multilingual benchmarks |
| Fastest response time | Gemini 2.5 Flash | Sub-second latency, streaming |

===== Context Window Comparison =====

^ Model ^ Context Window ^ Notes ^
| Gemini 2.5 Pro | 1,000,000 | Largest production context |
| Gemini 2.5 Flash | 1,000,000 | Same window, lower cost |
| GPT-4.1 | 1,000,000 | Newest OpenAI long-context |
| Claude Opus 4 | 200,000 | Extended thinking available |
| Claude Sonnet 4 | 200,000 | Same window as Opus |
| o3 | 200,000 | Thinking tokens use context |
| GPT-4o | 128,000 | Standard frontier context |
| Llama 4 Maverick | 128,000 | Open-weight |
| DeepSeek V3 | 128,000 | Budget option |
| Mistral Large | 128,000 | EU-compliant |

//Last updated: March 2026. Prices change frequently -- verify with provider.//