
Nvidia CEO Jensen Huang shared previously unreleased specifications for its Rubin graphics processing unit (GPU), due in 2026, the Rubin Ultra coming in 2027, and announced the addition of a new GPU called Feynman to the mix for 2028.
He provided the details during his keynote at the company’s GTC AI developer conference, being held this week in San Jose, California.
Nvidia will ship the Vera Rubin NVL144 system in the second half of 2026, he said; it will be three times faster than the Blackwell Ultra NVL72 system which will ship in the second half of this year.
The Vera Rubin NVL144 will offer 3.6 exaflops of FP4 performance and 1.2 exaflops of FP8 performance. The system will include a new CPU called Vera, which succeeds the current Grace CPUs and will have 88 custom cores and 176 threads.
It will also include the new Rubin GPU, which will succeed the Blackwell GPU, and offer 50 petaflops of FP4 performance. The GPU will include 288GB of new HBM4 memory, a format that is considered important for AI.
The system will also include a faster NVLink 6 interconnect, with total data transfer speeds of 260TB/second, which is two times faster than the NVLink 5 in the Blackwell Ultra NVL72 systems.
In the second half of 2027, Nvidia will ship the Rubin Ultra NVL576, which will be close to four times faster than Vera Rubin NVL144 systems.
“It’s … extreme scale up. Each rack is 600 kilowatts, 2.5 million parts, and obviously a whole lot of GPUs,” Huang said during the keynote.
The system will have 576 Rubin GPUs, 12,672 Vera CPU cores, 2,304 memory chips, 144 NVLink switches, 576 ConnectX-9 NICs and 72 Bluefield data processing units (DPUs). It will offer 15 exaflops of FP4 inferencing performance and 5 exaflops of FP8 performance, and include the faster HBM4e memory, which will transfer data at 4.6PBps, and the new NVLink 7 interconnect.
In 2028, Nvidia will release a GPU called Feynman, which will include next-generation HBM memory and will be paired with Vera CPUs in systems. Huang didn’t share additional information about the chip.
Huang also talked extensively about newer reasoning AI models, which can think more deeply to solve problems. The newer models, which will drive agentic AI, generate more tokens when reasoning, which is where faster GPUs step in.
“The amount of tokens generated is … higher. Easily 100 times more,” Huang said.
Power consumption a question
He didn’t discuss the power consumption of the systems, but it’s fair to assume these will consume significantly more power, and will reach hundreds of kilowatts of use in the next few years as the size of these systems goes up, said Jim McGregor, principal analyst at Tirias Research.
Nvidia’s Blackwell NVL72 system, which is shipping now, draws 120 kilowatts on FP4 performance. The successor Blackwell Ultra NVL72 systems shipping later this year could go up to 135 to 140 kilowatts, according to estimates published by TrendForce on Tuesday. But there are tricks to reduce power consumption, for example by using chiplets, which are computing tiles integrated inside silicon, McGregor noted.
Nvidia is prioritizing FP4, which is computationally advantageous, said Anshel Sag, principal analyst at Moor Insights & Strategy. It also reduces power consumption during inferencing.
“Using 4-bit floating point, we can quantize the model, use less energy to do [AI tasks]. And as a result, when you use less energy to do the same, you could do more,” Huang said.
Huang correlated power consumption to the AI revenue generated by data centers, noting that the company’s performance improvements in GPUs will increase and generate proportionally more revenue.
“Your revenues are power limited. You can figure out what your revenues are going to be based on the power you have to work with. This is no different than many other industries,” Huang said.
When data center economics, including flops, bandwidth, and watts, are taken into account, he said, “Rubin is going to drive the cost down significantly” compared to the older Hopper generation.
Partner announcements
Nvidia at GTC also introduced the Llama Nemotron family of reasoning models aimed at building agentic AI platforms. The company also teamed up with Cisco on Cisco Secure AI Factory, which combines Cisco networking and security gear with Nvidia DPUs, and third-party storage options.
Also at the conference, Dell announced PCs with Nvidia’s GPUs for developers who need to prototype AI models.
Source:: Network World