Nvidia fixes chip glitch, says production back on schedule for Q4

Nvidia has fixed a glitch in its latest chip, Blackwell, and should be able to resume production in Q4, its CEO told analysts on an earnings call Wednesday.

Reports of the glitch earlier this month caused some concerns with enterprise IT executives.

“We executed a change to the Blackwell GPU mask to improve production yields,” said Nvidia CEO Jensen Huang. “Blackwell production ramp is scheduled to begin in the fourth quarter and continue into fiscal year ’26. In Q4, we expect to get several billion dollars in Blackwell revenue. Hopper shipments are expected to increase in the second half of fiscal 2025.” The company’s fiscal 2025 began on January 29, 2024.

“The change to the mask is complete,” he said. “There were no functional changes necessary. And so, we’re sampling functional samples of Blackwell, Grace Blackwell, and a variety of system configurations as we speak. There are something like 100 different types of Blackwell-based systems that are built that were shown at Computex, and we’re enabling our ecosystem to start sampling those. The functionality of Blackwell is as it is, and we expect to start production in Q4.”

The CEO added: “Blackwell will start shipping out in billions of dollars at the end of this year.”

Huang did not discuss the nature of the problem nor of the fix.

Technology analyst Jeff Kagan said he doubts the delay will have any meaningful impact on enterprise IT operations.

“We have learned to always expect these kinds of glitches. Fortunately, they don’t stop progress and growth, although they can slow things down from time to time,” Kagan said. “In the end, this is not a long-term problem (as much as) one of many short-term issues that will be resolved.”

In the analyst call, Huang also explored his view of the future of enterprise computing, and the massive degree to which AI is going to change the nature of hardware and computing operations.

“We drove down the cost of training large language models or training deep learning so incredibly that it is now possible to have gigantic scale models, multitrillion-parameter models, and pretrain them on just about the world’s knowledge corpus, and let the model go figure out how to understand human language representation, and how to codify knowledge into its neural networks, and how to learn reasoning, and so which caused the generative AI revolution,” Huang said.

The CEO also argued that the financial underpinnings of IT environments are also changing rapidly.

“Whenever you double the size of a model, you also have to more than double the size of the data set to go train it. And so, the amount of flops necessary in order to create that model goes up quadratically,” he said. “It’s not unexpected to see that the next-generation models could take 10, 20, 40 times more compute than last generation. We have to continue to drive the generational performance up quite significantly so we can drive down the energy consumed and drive down the cost necessary to do it.”

Source:: Network World