IBM expands Nvidia GPU options for cloud customers

IBM is offering expanded access to Nvidia GPUs on IBM Cloud to help enterprise customers advance their AI implementations, including large language model (LLM) training. IBM Cloud users can now access Nvidia H100 Tensor Core GPU instances in virtual private cloud and managed Red Hat OpenShift environments.

The addition of the H100 Tensor Core GPU instances fills out a family of Nvidia GPUs and software that IBM already supports.

The Nvidia H100 Tensor Core GPU can enable up to 30X faster inference performance over the current A100 Tensor Core and will give IBM Cloud customers a range of processing capabilities while also addressing the cost of enterprise-wide AI tuning and inferencing, according to Rohit Badlaney, general manager of IBM Cloud Product and Industry Platforms, who wrote a blog post about the new GPU support.

“Businesses can start small, training small-scale models, fine-tuning models, or deploying applications like chatbots, natural language search, and using forecasting tools using Nvidia L40S and L4 Tensor Core GPUs,” Badlaney wrote. “As their needs grow, IBM Cloud customers can adjust their spend accordingly, eventually harnessing the H100 for the most demanding AI and High-Performance Computing use cases.”

IBM Cloud has a strong network to handle the increased workloads.

“The IBM Cloud network is all Ethernet based. For multiple H100 system deployment in a cluster, we are building advanced computing hubs in Washington, D.C. and Frankfurt that allow clients to connect multiple systems through RoCE/RDMA setup that allows up to 3.2 Tbps for GPU-to-GPU communication,” IBM stated. “For a single server deployment, the GPUs are connected using Nvidia NVLink which provides a high-speed, direct, point-to-point connection between GPUs and NVSwitch, Nvidia’s high-speed switching fabric that connects multiple GPUs. Together the technologies make up a Nvidia DGX turnkey configuration of Nvidia A100 or H100 Tensor Core GPUs.”

The IBM Cloud services include a variety of multi-level security protocols designed to protect AI and HPC processes and guard against data leakage and data privacy concerns, according to Badlaney. “It also includes built-in controls to establish infrastructure and data guardrails for AI workloads,” he wrote.

In addition, IBM Cloud includes deployment automation capabilities. “IBM Cloud automates the deployment of AI-powered applications, to help address the time and errors that could occur with manual configuration,” Badlaney wrote. “It also provides essential services such as AI lifecycle management solutions, serverless platform, storage, security and solutions to help clients monitor their compliance.”

Clients can also utilize IBM’s watsonx AI studio, data lakehouse, and governance toolkit for more intensive AI development, Badlaney stated.

IBM Cloud plus Nvidia

The H100 deployment is just the latest in a number of technology partnerships between IBM Cloud and Nvidia (Nvidia has partnerships with a number of cloud providers).

For example, earlier this year IBM said it is among the first to access new Nvidia generative AI microservices that customers and developers can use to create and deploy custom applications optimized for Nvidia’s GPUs.

In addition, IBM offers Nvidia L40S and Nvidia L4 Tensor Core GPUs, as well as support for Red Hat Enterprise Linux AI and OpenShift AI to help enterprises develop and support AI workloads.

IBM also has integrated Nvidia GPUs into its Watson AI platform to accelerate various AI workloads, including deep learning, natural language processing, and computer vision.

IBM Cloud plus Nvidia

Read more news about Nvidia