====== DeepSeek ======

**DeepSeek** is a Chinese AI research company founded in 2023 by **Liang Wenfeng**, founder of the quantitative hedge fund High-Flyer Capital, headquartered in Hangzhou, China. DeepSeek disrupted the global AI industry by demonstrating that frontier-class models could be trained for a fraction of the cost of Western competitors, using efficient Mixture of Experts (MoE) architectures. The company's models are fully open-source and available for self-hosting.((source [[https://www.britannica.com/money/DeepSeek|DeepSeek on Britannica]]))

===== Key Models =====

==== DeepSeek-V3 ====

The flagship general-purpose model, released December 2024 with V3.2 following in December 2025. DeepSeek-V3 handles writing, analysis, coding, and data tasks, matching the quality of GPT-5 and Claude Sonnet 4.6 at significantly lower cost. The model supports context windows handling approximately 2,000 pages of text.((source [[https://mysummit.school/blog/en/deepseek-review-2026/|DeepSeek Review 2026]]))

==== DeepSeek-R1 ====

A dedicated reasoning model released January 2025, excelling at step-by-step logical reasoning and outperforming GPT-4 o1-mini. R1 displays its full reasoning trace to users. Its release triggered a major market shock when investors realized it was trained for approximately **$5.6 million** using optimized NVIDIA H800 chips, compared to over $100 million for comparable Western models.((source [[https://www.voiceflow.com/blog/what-is-deepseek|What is DeepSeek]]))

==== Other Models ====

  * **DeepSeek-Janus Pro**: Multimodal model for image analysis and generation
  * **DeepSeek-V4** (February 2026): Next-generation model with integrated "deep thinking" reasoning capabilities

===== MoE Architecture =====

DeepSeek pioneered aggressive use of **Mixture of Experts (MoE)** starting with DeepSeek-MoE in January 2024. The approach activates only 2-3 specialist "expert" sub-networks per query. For example, R1 activates approximately 37 billion of its 671 billion total parameters for any given input, requiring 2-4x fewer computational resources than equivalent dense models.((source [[https://www.healthcare.digital/single-post/what-impact-has-deepseek-had-in-healthcare-and-ai-one-year-after-the-initial-hype|DeepSeek Impact Analysis]]))

===== Training Cost Breakthrough =====

The $5.6 million training cost for DeepSeek-R1 fundamentally repriced expectations for AI development:

^ Metric ^ DeepSeek ^ Western Competitors (e.g., GPT-4/5) ^
| Training Cost | ~$5.6M | $100M+ |
| API Input (per 1M tokens) | $0.14-0.55 | $1.25-15 |
| API Output (per 1M tokens) | $0.28-2.19 | $10-75 |

The cost efficiency was achieved through MoE architecture, optimized training infrastructure, and efficient use of NVIDIA H800 chips (the export-compliant variant available in China).

===== Industry Impact =====

DeepSeek-R1's January 2025 launch caused NVIDIA's market capitalization to drop by approximately $600 billion as investors reassessed assumptions about the compute requirements for frontier AI. The release:

  * Shifted industry focus from raw scale to "cognitive density" and architectural efficiency
  * Became the top U.S. App Store download (January 2025) via its chatbot application
  * Accelerated adoption of efficient architectures across the industry
  * Boosted AI deployment in developing nations through affordable self-hosting((source [[https://www.healthcare.digital/single-post/what-impact-has-deepseek-had-in-healthcare-and-ai-one-year-after-the-initial-hype|DeepSeek Market Impact]]))

===== Open-Source Approach =====

All DeepSeek models are fully open-source, downloadable for self-hosting on consumer hardware (including NVIDIA RTX 4090/5090 GPUs). There are no subscription fees, and the API is priced 10-50x cheaper than Western alternatives.

===== See Also =====

  * [[mistral_ai|Mistral AI]]
  * [[anthropic|Anthropic]]
  * [[xai_grok|xAI and Grok]]

===== References =====