Together AI

Together AI is a full-stack AI cloud platform specializing in fast inference, fine-tuning, pre-training, and GPU cluster management for open-source models. Founded as an inference-focused startup, Together AI reached a $3.3 billion valuation and $300 million annualized revenue by September 2025, serving companies including Cursor, Decagon, and Cartesia.¹⁾

Overview

Together AI provides developers and researchers with a unified API to run, train, fine-tune, and deploy open-source AI models across text, image, video, code, and voice modalities. The platform emphasizes end-to-end workflows from training to production, supporting over 200 open-source models with an OpenAI-compatible API for seamless integration.²⁾

Supported Models

The platform supports a broad range of open-source models including:

Llama variants (Meta)
Mixtral and Mistral models
Qwen series (including Qwen-3-235B-Instruct and Qwen-3-Coder-480B)
DeepSeek models (DeepSeek-R1, DeepSeek-V3.1)
GPT-OSS (20B and 120B variants)
Image and video generation models (40+ via Runware partnership)
NVIDIA Parakeet for voice ASR

Inference

Together AI achieves up to 2.75x faster serverless inference compared to competitors through GPU optimizations, low-bit quantization (FP4/FP8), and ATLAS (Adaptive Speculative Decoding), which provides up to 4x acceleration via runtime learning.³⁾

The platform offers two inference tiers:

Serverless Inference: On-demand model execution without infrastructure management, supporting batch inference at 50% lower cost
Dedicated Endpoints: Instant GPU clusters (from self-serve to thousands of GPUs) for low-latency, high-throughput workloads, achieving up to 110 tokens/sec on reasoning clusters

Fine-Tuning

Together AI provides a full fine-tuning platform supporting LoRA and DPO methods for task-specific model customization using proprietary data. The platform also supports pre-training from scratch on GPU clusters, with seamless transition from training to inference endpoints.⁴⁾

Pricing

Pricing ranges from $0.10 to $3.50 per million tokens depending on model size and optimization level. Batch inference is available at 50% lower cost. The platform claims approximately 60% cost reduction overall through quantization and inference optimizations.⁵⁾

Recent Developments

Achieved top benchmarks for inference speed on demanding models (2x faster on GPT-OSS, Qwen, DeepSeek)
Launched Dedicated Container Inference for custom media models with 1.4x-2.6x speedups
Hit $300 million ARR by September 2025
Showcased at NVIDIA GTC 2026 with NemoClaw integration for 150+ models⁶⁾