====== Fireworks AI ====== **Fireworks AI** is an enterprise-grade inference platform specializing in high-performance serving of open-source and custom AI models, optimized for speed, scalability, and production workloads. Founded by Lin Qiao, the platform processes over 13 trillion tokens daily, sustaining approximately 180,000 requests per second across its globally distributed Inference Cloud.((source [[https://fireworks.ai|Fireworks AI official site]])) ===== Overview ===== Fireworks AI provides serverless and dedicated inference for a wide range of open-source models, along with tools for fine-tuning, function calling, and compound AI system development. The platform is designed for production teams requiring enterprise-grade reliability, SLA guarantees, and auto-scaling.((source [[https://fireworks.ai/enterprise|Fireworks AI Enterprise]])) ===== Performance ===== The platform delivers industry-leading throughput and low latency: * Over 13 trillion tokens processed daily * Approximately 180,000 requests per second sustained throughput * 1,000+ tokens per second generation on large models * Top benchmarks on Artificial Analysis evaluations Fireworks achieves these speeds through proprietary inference engine optimizations including **FireOptimizer** for automatic model optimization during deployment.((source [[https://docs.fireworks.ai/models/overview|Fireworks AI Models Overview]])) ===== Supported Models ===== The platform provides serverless inference for pre-deployed models including: * **Llama 3.1** (70B Instruct and 405B) * **Mixtral** and **Mistral** variants * **DeepSeek** models * Vision and image models (Stable Diffusion, Flux) * Speech-to-text (Whisper-v3) Users can also upload custom base models, fine-tuned weights (including LoRA adapters), and deploy them via the same unified API. ===== Function Calling ===== **FireFunction** provides reliable function calling and structured JSON output from open-source models. This capability enables production-ready agent architectures where consistent tool use and structured outputs are critical for workflow automation.((source [[https://megaoneai.com/spotlight/fireworks-ai-review/|Fireworks AI Review]])) ===== Compound AI Systems ===== Fireworks supports building complex AI systems including: * AI agents with tool integration * Retrieval-Augmented Generation (RAG) pipelines * Multi-step workflows combining text, vision, embeddings, image generation, and speech-to-text through a unified API ===== Pricing ===== Fireworks uses per-token billing for serverless inference, scaled by model size. Enterprise plans include SLA-backed uptime, compliance features, and no additional cost for fine-tuning or deploying custom models. New users receive startup credits with pay-as-you-go billing thereafter.((source [[https://fireworks.ai/startups|Fireworks AI Startups]])) ===== Microsoft Foundry Integration ===== In 2025-2026, Fireworks entered public preview on **Microsoft Foundry** (Azure), embedding its inference engine for state-of-the-art open models with enterprise governance controls and customization capabilities.((source [[https://azure.microsoft.com/en-us/blog/introducing-fireworks-ai-on-microsoft-foundry-bringing-high-performance-low-latency-open-model-inference-to-azure/|Fireworks on Microsoft Foundry]])) ===== See Also ===== * [[together_ai|Together AI]] * [[replicate|Replicate]] * [[groq_inference|Groq Inference]] ===== References =====