Superhuman

Superhuman is an AI-powered productivity platform that leverages large language models to enhance communication and workflow efficiency. The platform serves over 40 million daily active users with real-time AI communication assistance features integrated across its product suite ¹⁾

Overview and Products

Superhuman operates a comprehensive suite of productivity tools designed to accelerate user workflows through AI integration. The platform's primary offerings include Superhuman Mail, a next-generation email client with AI-powered features; Coda, a collaborative document and workspace platform; and Superhuman Go, a mobile-optimized communication application ²⁾. These products are designed to reduce cognitive overhead and enable users to accomplish more with less time spent on communication and administrative tasks.

Superhuman Mail leverages the custom grammar correction LLM to provide real-time suggestions on correctness, clarity, tone, and style, serving as one of the primary surfaces where Superhuman's AI assistance reaches end-users at massive scale with sub-second latency requirements ³⁾. The platform's scale demonstrates significant market adoption, with the 40 million daily active users representing a substantial portion of professional workers relying on AI-assisted productivity features. This user base creates substantial infrastructure requirements for supporting real-time AI inference at scale. Superhuman Go, the mobile component of the product suite, benefits from the same custom LLM-powered AI communication assistance and grammar correction capabilities, extending the platform's real-time assistance features to users across millions of daily mobile interactions ⁴⁾

Infrastructure and Technical Architecture

Superhuman operates a custom grammar correction language model capable of handling 200,000 queries per second (QPS) at peak load, while maintaining sub-second P99 latency performance ⁵⁾. This technical achievement requires sophisticated infrastructure design to handle both the throughput demands and latency requirements of real-time user-facing applications.

The company initially built its inference infrastructure using a DIY vLLM stack, where vLLM (a commonly used open-source LLM serving framework) was self-managed. However, this approach created operational overhead and scaling challenges. Superhuman subsequently migrated to Databricks FMAPI Provisioned Throughput, a managed inference service that provides dedicated compute resources for LLM serving. This migration addressed both infrastructure complexity and allowed the platform to achieve consistent performance targets while reducing operational burden.

The shift from self-managed to managed inference infrastructure represents a common pattern in scaling AI applications, where companies transition from DIY approaches to managed services as user demands and operational complexity increase. Provisioned throughput approaches offer predictable performance characteristics and eliminate the need for manual scaling management.

Expanding AI Integration

Beyond the core grammar correction model powering real-time communication features, Superhuman is expanding AI integration across additional workflow domains ⁶⁾. The company is leveraging its infrastructure investment for training workflows, experiment tracking, and model evaluation systems. This broader adoption indicates a strategy to embed AI assistance across the entire productivity platform rather than limiting it to specific feature areas.

Training workflows refer to the processes of fine-tuning models or developing new AI capabilities tailored to specific use cases. Experiment tracking systems monitor and log parameters, metrics, and results from different model configurations to enable systematic improvement. Model evaluation systems assess performance across different dimensions and use cases. By consolidating these capabilities on a unified infrastructure platform, Superhuman can maintain consistency and efficiency across its AI development pipeline.

Performance Characteristics and Requirements

The sub-second P99 latency requirement reflects the demanding nature of real-time communication assistance. Users expect AI features like grammar correction and compositional assistance to respond instantaneously without disrupting their workflow. Achieving sub-second response times at 200K QPS requires careful attention to model size, serving infrastructure, caching strategies, and network optimization.

The shift to provisioned throughput infrastructure suggests that Superhuman prioritizes predictable performance over cost optimization, which aligns with a platform serving high-value professional users where service degradation carries significant opportunity costs. Managed inference services trade flexibility for reliability and consistent SLA guarantees.

References

¹⁾ , ²⁾ , ³⁾ , ⁵⁾ , ⁶⁾

Databricks - How Superhuman and Databricks Built 200K QPS Inference Platform Together (2026

⁴⁾

Databricks, 2026

AI Agent Knowledge Base

Sidebar

Table of Contents

Superhuman

Overview and Products

Infrastructure and Technical Architecture

Expanding AI Integration

Performance Characteristics and Requirements

See Also

References

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Superhuman

Overview and Products

Infrastructure and Technical Architecture

Expanding AI Integration

Performance Characteristics and Requirements

See Also

References

Page Tools