Oracle has started taking pre-orders for 131,072 Nvidia Blackwell GPUs in the cloud via its Oracle Cloud Infrastructure (OCI) Supercluster to aid large language model (LLM) training and other use cases, the company announced at the CloudWorld 2024 conference.
The launch of an offering that provides these many Blackwell GPUs, otherwise known as Grace Blackwell (GB) 200, is significant as enterprises globally are faced with the unavailability of high-bandwidth memory (HBM) — a key component used in making GPUs.
The current wait time for HBMs and in turn GPUs, according to top executives of several GPU-makers, is at least 18 months, which means that orders placed won’t be fulfilled before 2026.
On the other hand, GPUs have become increasingly important for several large software firms such as AWS, Google, and OpenAI, as the demand for generative AI continues to grow steadily.
The LLM providers are in a constant race to produce and release the next better-performing model. This can only be done by training the models with more data, which would require the use of better-performing GPUs to bring down training times.
OCI Superclusters with Nvidia GB200 NVL72 liquid-cooled bare-metal instances will use NVLink and NVLink Switch to enable up to 72 Blackwell GPUs to communicate with each other at an aggregate bandwidth of 129.6 TBps in a single NVLink domain, Oracle and Nvidia said in a joint statement.
The Blackwell GPUs are expected to be made available in the first half of 2025, the companies added without sharing any indicative pricing.
Presently, Oracle offers OCI Superclusters with Nvidia H100 GPUs. Superclusters with H200 GPUs are expected to be made available later this year.
However, Oracle is not the only cloud services provider that has been offering Nvidia GPUs-based infrastructure for model training.
AWS, Microsoft, and Google Cloud have also partnered with Nvidia to offer the latter’s GPUs on the cloud.
Nvidia first announced its intent to partner with cloud service providers to offer GPUs on the cloud in early 2023. Since then, the company has steadily expanded its partnership with the hyperscalers. Earlier this year, the chipmaker formally announced its plans to offer Blackwell GPUs on the cloud.
More GB200 GPUs than any other hyperscaler
Although Oracle is not the only cloud services provider to offer the latest Nvidia GPUs on the cloud, it said it is offering more GB200 GPUs than any other hyperscaler.
Oracle’s claim of offering six times as many GPUs as any other hyperscaler takes a dig at Project Ceiba from AWS and Nvidia, which is powered by 20,736 Blackwell GPUs. Ceiba has already been made available for research purposes, according to AWS.
AWS, just like Oracle, has not yet revealed the pricing of its Blackwell GPU-backed cloud infrastructure service and is expected to make it available in 2025. AWS has not even revealed the total number of GPUs it is expected to offer via the cloud infrastructure service.
Google Cloud, too, has not released the pricing information of its Blackwell GPU-backed service as the GPUs are expected to be available in early 2025. In April, Google Cloud said it would offer the GPU-backed service to its customers but did not disclose how many GPUs would be made available for customer enterprises to order.
Source:: Network World