Moonshot AI Kimi K2

Moonshot AI's Kimi K2 is a trillion-parameter open-source large language model built on a Mixture-of-Experts (MoE) Transformer architecture. First released in mid-2025, Kimi K2 represents one of China's most ambitious contributions to frontier AI, activating only 32 billion of its 1.04 trillion total parameters per token for efficient inference. ¹⁾ An upgraded version, Kimi K2.5, followed in January 2026 with native multimodal capabilities and an expanded 256K context window. ²⁾

Architecture

Kimi K2 employs a dense-sparse hybrid design with the following specifications:

Total parameters: 1.04 trillion
Active parameters per token: 32 billion
Layers: 61 (including 1 dense layer)
Experts: 384 total, 8 selected per token, plus 1 shared expert
Attention mechanism: Multi-head Latent Attention (MLA)
Activation function: SwiGLU
Vocabulary size: 160,000 tokens
Context window: 128K tokens (K2), 256K tokens (K2.5)

The MLA mechanism compresses key-value projections into a lower-dimensional space before computing attention scores, reducing memory bandwidth by 40-50%. ³⁾ The model was trained using the Muon optimizer, purpose-built for trillion-parameter MoE models.

Training

Kimi K2.5 was trained on 15 trillion mixed visual and textual tokens in a unified pipeline, allowing vision and language capabilities to develop together rather than as separate modules. ⁴⁾ The vision component uses MoonViT, a 400-million-parameter vision encoder that processes images through the same transformer architecture as text.

Key Capabilities

Agentic intelligence: Extended step-by-step reasoning and tool use optimization
Agent Swarm Technology: K2.5 can coordinate up to 100 specialized AI agents simultaneously, achieving 4.5x faster execution while reducing costs by 76% compared to leading closed models ⁵⁾
Visual coding: Generates production-ready React or HTML from UI mockups, including responsive design and accessibility
Autonomous visual debugging: Renders generated code, compares against original designs, and iteratively refines output

Benchmark Performance

Kimi K2.5 achieved 50.2% on Humanity's Last Exam at significantly lower cost than comparable closed models. ⁶⁾ Both versions achieve state-of-the-art open-model performance across code, reasoning, and multi-step tasks.

Deployment

Both models are fully open-source, distributed via HuggingFace in base and instruction-tuned variants. K2 runs at approximately 15 tokens/s on two Apple M3 Ultras. Using Unsloth Dynamic 1.8-bit quantization, K2.5's disk requirements drop from 600GB to 240GB, enabling operation on a single 24GB GPU with system RAM offloading. ⁷⁾ A notable trade-off is that K2 exhibits 2-2.5x higher token usage compared to other models.

China's MoE Advancements

Kimi K2 is part of a broader wave of Chinese MoE model development. Moonshot AI, founded in 2023 by former Tsinghua University researchers, has positioned itself alongside DeepSeek and other Chinese labs pushing the boundaries of efficient large-scale AI. The MoE architecture allows these models to compete with Western frontier models while requiring significantly less compute per inference request.

References

¹⁾

Source: Intuition Labs — Kimi K2 Technical Deep Dive

²⁾ , ³⁾ , ⁵⁾ , ⁶⁾

Source: Codecademy — Kimi K2.5 Complete Guide

⁴⁾

Source: NVIDIA — Kimi K2.5 Model Card

⁷⁾

Source: Unsloth — Kimi K2.5

AI Agent Knowledge Base

Sidebar

Table of Contents

Moonshot AI Kimi K2

Architecture

Training

Key Capabilities

Benchmark Performance

Deployment

China's MoE Advancements

See Also

References

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Moonshot AI Kimi K2

Architecture

Training

Key Capabilities

Benchmark Performance

Deployment

China's MoE Advancements

See Also

References

Page Tools