====== Nvidia Vera Rubin Platform ====== The **Nvidia Vera Rubin** platform is a comprehensive AI and HPC system integrating Rubin GPUs, Vera CPUs, advanced networking, and storage components. Announced at GTC 2025 as the successor to the Blackwell architecture, Vera Rubin delivers up to 5x inference performance and 10x lower cost per token at rack scale. ((Source: [[https://hashrateindex.com/blog/nvidia-vera-rubin-nvl72-specs-breakdown/|Hashrate Index — Vera Rubin NVL72 Specs]])) ===== Rubin GPU ===== ^ Specification ^ Detail ^ | Process Node | TSMC 3nm | | Transistors | 336 billion (dual reticle-size compute dies + 2 I/O tiles) | | Memory | 288 GB HBM4 at 22 TB/s bandwidth | | Inference Performance | 50 PFLOPS NVFP4 (5x Blackwell) | | Training Performance | 35 PFLOPS NVFP4 (3.5x Blackwell) | | Transformer Engine | 3rd generation with hardware-accelerated adaptive compression | | TDP | Max Q ~1.8 kW, Max P ~2.3 kW | The Rubin CPX variant features 128 GB GDDR7 for massive-context inference (million-token workloads). The NVL144 CPX platform delivers 8 exaflops AI performance with 100 TB fast memory. ((Source: [[https://nvidianews.nvidia.com/news/nvidia-unveils-rubin-cpx-a-new-class-of-gpu-designed-for-massive-context-inference|NVIDIA — Rubin CPX Announcement]])) ===== Vera CPU ===== ^ Specification ^ Detail ^ | Cores | 88 custom Armv9.2 Olympus cores (176 threads via Spatial Multithreading) | | Memory | Up to 1.5 TB SOCAMM2 LPDDR5X at 1.2 TB/s bandwidth | | GPU Connectivity | 1.8 TB/s NVLink-C2C (7x PCIe Gen 6) | | Performance | 2x Grace CPU in Blackwell systems | | Features | Scalable Coherency Fabric (SCF); confidential computing; agentic reasoning support | The Vera CPU handles data movement, analytics, and orchestration, pairing with Rubin GPUs or operating standalone for HPC and cloud workloads. ((Source: [[https://www.nvidia.com/en-us/data-center/technologies/rubin/|NVIDIA — Rubin Platform]])) ===== NVL72 Rack Configuration ===== The primary deployment unit for Vera Rubin is the NVL72 rack: ^ Component ^ Specification ^ | Compute | 72 Rubin GPUs + 36 Vera CPUs | | Inference | 3.6 EFLOPS NVFP4 | | Training | 2.5 EFLOPS | | GPU Memory | 20.7 TB HBM4 (1.6 PB/s bandwidth) | | CPU Memory | 54 TB LPDDR5X | | Scale-Up | NVLink 6 at 260 TB/s rack-scale bandwidth | | Cooling | 100% liquid (45°C inlet) | | Serviceability | Modular cable-free trays (18x faster service vs Blackwell) | The NVL72 scales to 40 racks in a POD configuration, delivering 60 exaflops aggregate performance. ((Source: [[https://hashrateindex.com/blog/nvidia-vera-rubin-nvl72-specs-breakdown/|Hashrate Index — Vera Rubin NVL72 Specs]])) ===== Networking ===== * **NVLink 6** — 260 TB/s scale-up bandwidth; enables 576-GPU die world size in Rubin Ultra ((Source: [[https://hashrateindex.com/blog/nvidia-vera-rubin-nvl72-specs-breakdown/|Hashrate Index — Vera Rubin NVL72 Specs]])) * **ConnectX-9 SuperNIC** — 1.6 Tb/s per GPU with 8 NICs per tray * **Spectrum-6 / Quantum-CX9** — Photonics for Ethernet and InfiniBand scale-out * **BlueField-4 DPU** — 64-core Grace CPU + ConnectX-9 with 2x bandwidth, 3x memory, 6x compute versus BF-3 ===== Timeline ===== First deployments are expected in H2 2026 from AWS, Google Cloud, Microsoft Azure, Oracle Cloud, and CoreWeave. Volume production of supporting technologies (HBM4, SOCAMM2) is already underway. NVL144 CPX racks targeting 600 kW are projected for 2027. ((Source: [[https://hashrateindex.com/blog/nvidia-vera-rubin-nvl72-specs-breakdown/|Hashrate Index — Vera Rubin NVL72 Specs]])) ===== See Also ===== * [[hbm4e_memory|HBM4E Memory]] * [[nvidia_nemotron|Nvidia Nemotron]] * [[ai_superfactory|AI Superfactory]] * [[nvidia_omniverse_digital_twins|Nvidia Omniverse Digital Twins]] ===== References =====