====== Nvidia Nemotron ====== **Nvidia Nemotron** is a family of open Mixture-of-Experts (MoE) large language models developed by NVIDIA, optimized for efficient agentic AI applications including multi-agent workflows, reasoning, and high-throughput inference. The Nemotron 3 series, announced in December 2025, represents the latest generation with models spanning from compact edge deployment to complex reasoning at scale. ((Source: [[https://nvidianews.nvidia.com/news/nvidia-debuts-nemotron-3-family-of-open-models|NVIDIA — Nemotron 3 Family]])) ===== Model Family ===== The Nemotron 3 lineup uses hybrid Mamba-Transformer MoE architectures for efficiency and scalability: ^ Model ^ Total Parameters ^ Active Parameters ^ Context Window ^ Primary Use Case ^ | Nemotron 3 Nano | 30B | 3B | 1M tokens | Software assistance, content generation, information retrieval | | Nemotron 3 Super | ~120B | 12B | — | Multi-agent scenarios, IT automation, collaborative agents | | Nemotron 3 Ultra | ~500B | Up to 50B/token | — | Complex reasoning, long-horizon planning, strategic decision-making | All models use **LatentMoE** in Super/Ultra variants for hardware-aware expert routing, and support **4-bit NVFP4 precision** on Blackwell GPUs for reduced memory and accelerated inference without accuracy loss. ((Source: [[https://research.nvidia.com/labs/nemotron/Nemotron-3/|NVIDIA Research — Nemotron 3]])) ===== Architecture ===== * **Hybrid Mamba-Transformer MoE** — Combines the efficiency of Mamba state-space models with Transformer attention for long-context processing * **LatentMoE** — Hardware-optimized expert routing for Super and Ultra models * **NVFP4 training** — 4-bit precision on Blackwell GPUs reduces memory footprint while maintaining accuracy ===== Performance ===== ^ Metric ^ Result ^ | Nano throughput | 3.3x higher than Qwen3-30B-A3B on H200 GPU | | Nano vs predecessor | 4x throughput improvement over Nemotron 2 Nano | | Super throughput | 5x higher for complex multi-agent tasks | | Reasoning tokens | Up to 60% fewer reasoning tokens than previous generation | | Multi-agent scaling | Supports dozens to hundreds of agents in workflows | Nemotron 3 Nano outperforms GPT-OSS-20B and Qwen3-30B on agentic benchmarks. The Super variant released at GTC 2026 (March) confirms ongoing leadership in open models for agentic AI. ((Source: [[https://www.datacamp.com/blog/nvidia-nemotron-3|DataCamp — Nvidia Nemotron 3]])) ===== Ecosystem ===== NVIDIA provides training datasets, reinforcement learning environments, and libraries alongside Nemotron models to enable transparent agent development. The models are positioned as independent open alternatives in the agentic AI space, competing with models from Meta (Llama), Mistral, and DeepSeek. ((Source: [[https://nvidianews.nvidia.com/news/nvidia-debuts-nemotron-3-family-of-open-models|NVIDIA — Nemotron 3 Family]])) ===== See Also ===== * [[nvidia_vera_rubin|Nvidia Vera Rubin]] * [[nvidia_omniverse_digital_twins|Nvidia Omniverse Digital Twins]] * [[ai_superfactory|AI Superfactory]] ===== References =====