IBM expands Nvidia GPU options for cloud customers

IBM is offering expanded access to Nvidia GPUs on IBM Cloud to help enterprise customers advance their AI implementations, including large language model (LLM) training. IBM Cloud users can now access Nvidia H100 Tensor Core GPU instances in virtual private cloud and managed Red Hat OpenShift environments.

The addition of the H100 Tensor Core GPU instances fills out a family of Nvidia GPUs and software that IBM already supports.

The Nvidia H100 Tensor Core GPU can enable up to 30X faster inference performance over the current A100 Tensor Core and will give IBM Cloud customers a range of processing capabilities while also addressing the cost of enterprise-wide AI tuning and inferencing, according to Rohit Badlaney, general manager of IBM Cloud Product and Industry Platforms, who wrote a blog post about the new GPU support. 

“Businesses can start small, training small-scale models, fine-tuning models, or deploying applications like chatbots, natural language search, and using forecasting tools using Nvidia L40S and L4 Tensor Core GPUs,” Badlaney wrote. “As their needs grow, IBM Cloud customers can adjust their spend accordingly, eventually harnessing the H100 for the most demanding AI and High-Performance Computing use cases.”

IBM Cloud has a strong network to handle the increased workloads.

“The IBM Cloud network is all Ethernet based. For multiple H100 system deployment in a cluster, we are building advanced computing hubs in Washington, D.C. and Frankfurt that allow clients to connect multiple systems through RoCE/RDMA setup that allows up to 3.2 Tbps for GPU-to-GPU communication,” IBM stated. “For a single server deployment, the GPUs are connected using Nvidia NVLink which provides a high-speed, direct, point-to-point connection between GPUs and NVSwitch, Nvidia’s high-speed switching fabric that connects multiple GPUs. Together the technologies make up a Nvidia DGX turnkey configuration of Nvidia A100 or H100 Tensor Core GPUs.”

The IBM Cloud services include a variety of multi-level security protocols designed to protect AI and HPC processes and guard against data leakage and data privacy concerns, according to Badlaney. “It also includes built-in controls to establish infrastructure and data guardrails for AI workloads,” he wrote.

In addition, IBM Cloud includes deployment automation capabilities. “IBM Cloud automates the deployment of AI-powered applications, to help address the time and errors that could occur with manual configuration,” Badlaney wrote. “It also provides essential services such as AI lifecycle management solutions, serverless platform, storage, security and solutions to help clients monitor their compliance.”

Clients can also utilize IBM’s watsonx AI studio, data lakehouse, and governance toolkit for more intensive AI development, Badlaney stated.

IBM Cloud plus Nvidia

The H100 deployment is just the latest in a number of technology partnerships between IBM Cloud and Nvidia (Nvidia has partnerships with a number of cloud providers).

For example, earlier this year IBM said it is among the first to access new Nvidia generative AI microservices that customers and developers can use to create and deploy custom applications optimized for Nvidia’s GPUs.

In addition, IBM offers Nvidia L40S and Nvidia L4 Tensor Core GPUs, as well as support for Red Hat Enterprise Linux AI and OpenShift AI to help enterprises develop and support AI workloads.

IBM also has integrated Nvidia GPUs into its Watson AI platform to accelerate various AI workloads, including deep learning, natural language processing, and computer vision.

Read more news about Nvidia

  • Why is the DOJ investigating Nvidia? The antitrust investigation appears to have no public complainants or perceived victims.
  • Nvidia Blackwell GPU hit with delays: Shipments of Nvidia’s next-generation Blackwell architecture could be delayed for three months or more, according to published reports.
  • US DOJ intensifies antitrust investigation into Nvidia with new subpoenas: The US Department of Justice (DOJ) has escalated its antitrust investigation into Nvidia by issuing subpoenas to the company and others in the industry. This move marks a significant step forward in the probe, signaling that the government may be closer to filing a formal complaint
  • Nvidia fixes chip glitch, says production back on schedule: Nvidia has fixed a glitch in its latest chip, Blackwell, and should be able to resume production in Q4, its CEO told analysts on an earnings call.
  • Oracle to offer 131,072 Nvidia Blackwell GPUs via its cloud: Oracle has started taking pre-orders for 131,072 Nvidia Blackwell GPUs in the cloud via its Oracle Cloud Infrastructure (OCI) Supercluster to aid LLM training and other use cases.  
  • Nvidia enhances OpenUSD to drive wider industry adoption: The chipmaker aims to empower more industries to build physically based virtual worlds and digital twins.
  • AI workloads: There’s room for Nvidia competition in the AI market: AI is a multifaceted discipline, and there’s room for many different chips beyond Nvidia GPUs, although consolidation is inevitable. 
  • Nvidia launches Blackwell GPU architecture: The next-gen Blackwell architecture will offer a 4x performance boost over the current Hopper lineup, Nvidia claims.
  • Nvidia teases next-generation Rubin platform, shares physical AI vision: ‘I’m not sure yet whether I’m going to regret this or not,’ said Nvidia CEO Jensen Huang as he revealed 2026 plans for the company’s Rubin GPU platform.

Source:: Network World