====== DeepSeek ====== **DeepSeek** is a Chinese AI research company founded in 2023 by **Liang Wenfeng**, founder of the quantitative hedge fund High-Flyer Capital, headquartered in Hangzhou, China. DeepSeek disrupted the global AI industry by demonstrating that frontier-class models could be trained for a fraction of the cost of Western competitors, using efficient Mixture of Experts (MoE) architectures. The company's models are fully open-source and available for self-hosting.((source [[https://www.britannica.com/money/DeepSeek|DeepSeek on Britannica]])) ===== Key Models ===== ==== DeepSeek-V3 ==== The flagship general-purpose model, released December 2024 with V3.2 following in December 2025. DeepSeek-V3 handles writing, analysis, coding, and data tasks, matching the quality of GPT-5 and Claude Sonnet 4.6 at significantly lower cost. The model supports context windows handling approximately 2,000 pages of text.((source [[https://mysummit.school/blog/en/deepseek-review-2026/|DeepSeek Review 2026]])) ==== DeepSeek-R1 ==== A dedicated reasoning model released January 2025, excelling at step-by-step logical reasoning and outperforming GPT-4 o1-mini. R1 displays its full reasoning trace to users. Its release triggered a major market shock when investors realized it was trained for approximately **$5.6 million** using optimized NVIDIA H800 chips, compared to over $100 million for comparable Western models.((source [[https://www.voiceflow.com/blog/what-is-deepseek|What is DeepSeek]])) ==== Other Models ==== * **DeepSeek-Janus Pro**: Multimodal model for image analysis and generation * **DeepSeek-V4** (February 2026): Next-generation model with integrated "deep thinking" reasoning capabilities ===== MoE Architecture ===== DeepSeek pioneered aggressive use of **Mixture of Experts (MoE)** starting with DeepSeek-MoE in January 2024. The approach activates only 2-3 specialist "expert" sub-networks per query. For example, R1 activates approximately 37 billion of its 671 billion total parameters for any given input, requiring 2-4x fewer computational resources than equivalent dense models.((source [[https://www.healthcare.digital/single-post/what-impact-has-deepseek-had-in-healthcare-and-ai-one-year-after-the-initial-hype|DeepSeek Impact Analysis]])) ===== Training Cost Breakthrough ===== The $5.6 million training cost for DeepSeek-R1 fundamentally repriced expectations for AI development: ^ Metric ^ DeepSeek ^ Western Competitors (e.g., GPT-4/5) ^ | Training Cost | ~$5.6M | $100M+ | | API Input (per 1M tokens) | $0.14-0.55 | $1.25-15 | | API Output (per 1M tokens) | $0.28-2.19 | $10-75 | The cost efficiency was achieved through MoE architecture, optimized training infrastructure, and efficient use of NVIDIA H800 chips (the export-compliant variant available in China). ===== Industry Impact ===== DeepSeek-R1's January 2025 launch caused NVIDIA's market capitalization to drop by approximately $600 billion as investors reassessed assumptions about the compute requirements for frontier AI. The release: * Shifted industry focus from raw scale to "cognitive density" and architectural efficiency * Became the top U.S. App Store download (January 2025) via its chatbot application * Accelerated adoption of efficient architectures across the industry * Boosted AI deployment in developing nations through affordable self-hosting((source [[https://www.healthcare.digital/single-post/what-impact-has-deepseek-had-in-healthcare-and-ai-one-year-after-the-initial-hype|DeepSeek Market Impact]])) ===== Open-Source Approach ===== All DeepSeek models are fully open-source, downloadable for self-hosting on consumer hardware (including NVIDIA RTX 4090/5090 GPUs). There are no subscription fees, and the API is priced 10-50x cheaper than Western alternatives. ===== See Also ===== * [[mistral_ai|Mistral AI]] * [[anthropic|Anthropic]] * [[xai_grok|xAI and Grok]] ===== References =====