NVIDIA announced the NVIDIA Vera Rubin platform is opening the next frontier of agentic AI, with seven new chips now in full production to scale the world?s largest AI factories. The platform brings together the NVIDIA Vera CPU, NVIDIA Rubin GPU, NVIDIA NVLink 6 Switch, NVIDIA ConnectX-9 SuperNIC, NVIDIA BlueField-4 DPU and NVIDIA Spectrum?-6 Ethernet switch, as well as the newly integrated NVIDIA Groq 3 LPU. Designed to operate together as one incredible AI supercomputer, the chips power every phase of AI ?
from massive-scale pretraining, post-training and test-time scaling to real-time agentic inference. Through deep codesign across compute, networking and storage, supported by an ecosystem of more than 80 NVIDIA MGX ecosystem partners with a global supply chain, NVIDIA Vera Rubin offers the most extensive NVIDIA POD-scale platform ? a supercomputer where multiple racks purpose-built for AI work together as one massive, coherent system.Integrating 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6, along with ConnectX-9 SuperNICs and BlueField-4 DPUs, Vera Rubin NVL72 delivers breakthrough efficiency ?
training large mixture-of-experts models with one-fourth the number of GPUs compared with the NVIDIA Blackwell platform and achieving up to 10x higher inference throughput per watt at one-tenth the cost per token.Designed for hyperscale AI factories worldwide, NVL72 scales seamlessly with NVIDIA Quantum-X800 InfiniBand and Spectrum-X Ethernet to sustain high utilization across massive GPU clusters while reducing time to train and total cost of ownership.Reinforcement learning and agentic AI workloads rely on large numbers of CPU-based environments to test and validate the results generated by models running on GPU systems. The NVIDIA Vera CPU Rack delivers dense, liquid-cooled infrastructure built on NVIDIA MGX, integrating 256 Vera CPUs to provide scalable, energy-efficient capacity with world- class single-threaded performance, unlocking agentic AI at scale.Integrated with Spectrum-X Ethernet networking, Vera CPU racks keep CPU environments tightly synchronized across the AI factory. Together with GPU compute racks, they provide the CPU foundation for large-scale agentic AI and reinforcement learning ? with Vera delivering results twice as efficiently and 50% faster than traditional CPUs.NVIDIA Groq 3 LPX Rack.
NVIDIA Groq 3 LPX marks a milestone in accelerated computing. Designed for the low-latency and large-context demands of agentic systems, LPX and Vera Rubin unite the extreme performance of both processors to deliver up to 35x higher inference throughput per megawatt and up to 10x more revenue opportunity for trillion-parameter models.At scale, a fleet of LPUs function as a giant single processor for fast, deterministic inference acceleration. The LPX rack with 256 LPU processors features 128GB of on-chip SRAM and 640 TB/s of scale-up bandwidth.
Deployed with Vera Rubin NVL72, Rubin GPUs and LPUs boost decode by jointly computing every layer of the AI model for every output token.Optimized for trillion-parameter models and million-token context, the codesigned LPX architecture pairs with Vera Rubin to maximize efficiency across power, memory and compute. The additional throughput per watt and token performance unlocks a new tier of ultra-premium, trillion-parameter, million-context inference, expanding revenue opportunity for all AI providers. Fully liquid cooled and built on MGX infrastructure, LPX integrates seamlessly into next-generation Vera Rubin AI factories to be available in the second half of this year.NVIDIA BlueField-4 STX Storage Rack.
The NVIDIA BlueField-4 STX rack-scale system is an AI-native storage infrastructure that extends GPU memory seamlessly across the POD. Powered by BlueField-4 ? combining the NVIDIA Vera CPU and NVIDIA ConnectX-9 SuperNIC ?
STX delivers a high-bandwidth shared layer optimized for storing and retrieving the massive key-value cache data generated by large language models and agentic AI workflows. NVIDIA DOCA Memos? a new DOCA framework that supercharges BlueField-4 storage ?
enables dedicated KV cache storage processing to boost inference throughput by up to 5x while significantly improving power efficiency compared with general-purpose storage architectures. The result is POD-wide context that delivers faster multi-turn interactions with AI agents, more scalable AI services and higher overall infrastructure utilization.Spectrum-6 SPX Ethernet is engineered to accelerate east-west traffic across AI factories. Configurable with either Spectrum-X Ethernet or NVIDIA Quantum-X800 InfiniBand switches, it delivers low-latency, high-throughput rack-to-rack connectivity at scale.Spectrum-X Ethernet Photonics with co-packaged optics achieves up to 5x greater optical power efficiency and 10x higher resiliency compared with traditional pluggable transceivers.NVIDIA, along with over 200 data center infrastructure partners, announced the NVIDIA DSX platform for Vera Rubin.
This includes DSX Max-Q to enable dynamic power provisioning across the entire AI factory, resulting in the deployment of 30% more AI infrastructure within a fixed-power data center. The new DSX Flex software enables AI factories to be grid-flexible assets, unlocking 100 gigawatts of stranded grid power.NVIDIA also today released the Vera Rubin DSX AI Factory reference design, a blueprint for codesigned AI infrastructure that maximizes tokens per watt and overall goodput, improving system resiliency and accelerating time to first production.By tightly integrating compute, networking, storage, power and cooling, the architecture increases energy efficiency and ensures AI factories can scale reliably under continuous, high-intensity workloads with maximum uptime.Vera Rubin-based products will be available from partners starting the second half of this year. This includes leading cloud providers Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure, along with NVIDIA Cloud Partners CoreWeave, Crusoe, Lambda, Nebius, Nscale and Together AI.Global system manufacturers Cisco, Dell Technologies, HPE, Lenovo and Supermicro are expected to deliver a wide range of servers based on Vera Rubin products, as well as Aivres, ASUS, Foxconn, GIGABYTE, Inventec, Pegatron, Quanta Cloud Technology (QCT), Wistron and Wiwynn.AI labs and frontier model developers including Anthropic, Meta, Mistral AI and OpenAI are looking to use the NVIDIA Vera Rubin platform to train larger, more capable models and to serve long-context, multimodal systems at lower latency and cost than with prior GPU generations.




















