CoreWeave achieves a first with Nvidia GB300 NVL72 deployment

Hyperscaler CoreWeave on Thursday announced that it is the first AI cloud provider to deploy the latest Nvidia GB300 NVL72 systems, as a result of collaborations with Dell Technologies, data center provider Switch, and critical infrastructure and services firm Vertiv.

The CoreWeave GB300 NVL72 is a rack-scale, liquid-cooled platform that unifies 72 Nvidia Blackwell Ultra GPUs, 36 Arm-based Nvidia Grace CPUs, and 36 NVIDIA BlueField-3 DPUs into a single, powerful platform. The deployment is “tightly integrated with CoreWeave’s cloud-native software stack, including CoreWeave Kubernetes Service (CKS) and Slurm on Kubernetes (SUNK),” CoreWeave said in a release.

Additionally, Dell noted, CoreWeave’s adoption of Dell Integrated Racks with Nvidia GB300 NVL72 “pushes the boundaries of scalability for cloud services. The deployment will support CoreWeave’s expanding AI cloud platform, simplifying everything from large language models training and reasoning to real-time inferencing.”

“This system is engineered to handle the massive computational demands of test-time scaling inference, a critical component for deploying state-of-the-art AI models,” added Peter Salanki, CoreWeave’s chief technology officer, in a blog. “As AI models continue to rapidly grow in size and complexity, the need for purpose-built AI infrastructure will only continue to grow at the same pace.”

A big win for Dell

It’s a big win for Dell, said Matt Kimball, vice president and principal analyst, datacenter compute and storage at Moor Insights & Strategy, who described the launch as a “big accomplishment for Dell on a couple of fronts. First, there’s the ‘first to market’ aspect to it, which is good at setting the pace (and bragging rights).”

It is, he said, “a big deal and says a lot about the relationship between Dell and Nvidia, and Dell and the neocloud community. Dell made a run at the cloud market and pulled back around 2016. Though it re-entered a few years later, I think its embracing of OCP and the module hardware system (MHS) design, along with its tight integration with Nvidia, has made it a legitimate player in a market that has long been dominated by the likes of white boxes and Supermicro.”

The deployment, Kimball said, “brings Dell quality to the commodity space. Wins like this really validate what Dell has been doing in reshaping its portfolio to accommodate the needs of the market — both in the cloud and the enterprise.”

Although concerns were voiced last year that Nvidia’s next-generation Blackwell data center processors had significant overheating problems when they were installed in high-capacity server racks, he said that a repeat performance is unlikely.

Nvidia, said Kimball “has been very disciplined in its approach with its GPUs and not shipping silicon until it is ready. And Dell almost doubles down on this maniacal quality focus. I don’t mean to sound like I have blind faith, but I’ve watched both companies over the last several years be intentional in delivering product in volume. Especially as the competitive market starts to shape up more strongly, I expect there is an extremely high degree of confidence in quality.”

CoreWeave ‘has one purpose’

He said, “like Lambda Labs, Crusoe and others, [CoreWeave] seemingly has one purpose (for now): deliver GPU capacity to the market. While I expect these cloud providers will expand in services, I think for now the type of customer employing services is on the early adopter side of AI. From an enterprise perspective, I have to think that organizations well into their AI journey are the consumers of CoreWeave.”

“CoreWeave is also being utilized by a lot of the model providers and tech vendors playing in the AI space,” Kimball pointed out. “For instance, it’s public knowledge that Microsoft, OpenAI, Meta, IBM and others use CoreWeave GPUs for model training and more. It makes sense. These are the customers that truly benefit from the performance lift that we see from generation to generation.”

Source:: Network World