Nvidia Nemotron is a family of open Mixture-of-Experts (MoE) large language models developed by NVIDIA, optimized for efficient agentic AI applications including multi-agent workflows, reasoning, and high-throughput inference. The Nemotron 3 series, announced in December 2025, represents the latest generation with models spanning from compact edge deployment to complex reasoning at scale. 1)
The Nemotron 3 lineup uses hybrid Mamba-Transformer MoE architectures for efficiency and scalability:
| Model | Total Parameters | Active Parameters | Context Window | Primary Use Case |
|---|---|---|---|---|
| Nemotron 3 Nano | 30B | 3B | 1M tokens | Software assistance, content generation, information retrieval |
| Nemotron 3 Super | ~120B | 12B | — | Multi-agent scenarios, IT automation, collaborative agents |
| Nemotron 3 Ultra | ~500B | Up to 50B/token | — | Complex reasoning, long-horizon planning, strategic decision-making |
All models use LatentMoE in Super/Ultra variants for hardware-aware expert routing, and support 4-bit NVFP4 precision on Blackwell GPUs for reduced memory and accelerated inference without accuracy loss. 2)
| Metric | Result |
|---|---|
| Nano throughput | 3.3x higher than Qwen3-30B-A3B on H200 GPU |
| Nano vs predecessor | 4x throughput improvement over Nemotron 2 Nano |
| Super throughput | 5x higher for complex multi-agent tasks |
| Reasoning tokens | Up to 60% fewer reasoning tokens than previous generation |
| Multi-agent scaling | Supports dozens to hundreds of agents in workflows |
Nemotron 3 Nano outperforms GPT-OSS-20B and Qwen3-30B on agentic benchmarks. The Super variant released at GTC 2026 (March) confirms ongoing leadership in open models for agentic AI. 3)
NVIDIA provides training datasets, reinforcement learning environments, and libraries alongside Nemotron models to enable transparent agent development. The models are positioned as independent open alternatives in the agentic AI space, competing with models from Meta (Llama), Mistral, and DeepSeek. 4)