
HPE and Nvidia added to their joint, prepackaged service offerings that are aimed at helping enterprises support AI workloads.
At Nvidia’s GTC AI Conference going on this week in San Jose, Calif., HPE said it would add features to its Nvidia co-developed Private Cloud AI package, which integrates Nvidia GPUs, networks, and software with HPE’s AI memory, computing and GreenLake cloud support.
The Nvidia AI Computing by HPE package also integrates Nvidia’s Inference Microservices (NIM) to help customers quickly develop and deploy AI training and inference applications. It also supports HPE’s Data Fabric architecture which aims supply a unified and consistent data layer that allows data access across premises data centers, public clouds, and edge environments with the idea of bringing together a single, logical view of data, regardless of where it resides, according to HPE.
To the HPE Private Cloud AI service the vendors are adding a new developer system that lowers the bar to enterprises looking to start developing AI, said Cheri Williams, senior vice president and general manager of GreenLake Flex Solutions at HPE.
“The developer’s edition is designed as an accessible starting point for AI development capabilities. It has the same predefined software tools that our standard private cloud AI offers and includes a scalable foundation that works with the larger configurations in the portfolio,” Williams said. “The idea is to give enterprise developers and data engineers the flexibility to build and iterate AI quickly and easily move it to more performant infrastructure.”
This new turnkey developer system can prove and validate AI projects faster than ever, wrote Michael Corrado, senior world wide marketing manager for HPE in a blog post about the news.
“Accelerated by 2 NVIDIA H100 NVL, [HPE Private Cloud AI Developer System] includes an integrated control node, end-to-end AI software that includes NVIDIA AI Enterprise and HPE AI Essentials, and 32TB of integrated storage providing everything a developer needs to prove and scale AI workloads,” Corrado wrote.
In addition, HPE Private Cloud AI includes support for new Nvidia GPUs and blueprints that deliver proven and functioning AI workloads like data extraction with a single click, Corrado wrote.
HPE data fabric software
HPE has also extended support for its Data Fabric technology across the Private Cloud offering. The Data Fabric aims to create a unified and consistent data layer that spans across diverse locations, including on-premises data centers, public clouds, and edge environments to provide a single, logical view of data, regardless of where it resides, HPE said.
“The new release of Data Fabric Software Fabric is the data backbone of the HPE Private Cloud AI data Lakehouse and provides an iceberg interface for PC-AI users to data hosed throughout their enterprise. This unified data layer allows data scientists to connect to external stores and query that data as iceberg compliant data without moving the data,” wrote HPE’s Ashwin Shetty in a blog post. “Apache Iceberg is the emerging format for AI and analytical workloads. With this new release Data Fabric becomes an Iceberg end point for AI engineering. This makes it simple for AI engineering data scientists to easily point to the data lakehouse data source and run a query directly against it. Data Fabric takes care of metadata management, secure access, joining files or objects across any source on-premises or in the cloud in the global namespace.”
In addition, HPE Private Cloud AI now supports pre-validated Nvidia blueprints to help customers implement support for AI workloads.
AI infrastructure optimization
Aiming to help customers manage their AI infrastructure, HPE enhanced its OpsRamp management package which monitors servers, networks, storage, databases, and applications. To OpsRamp the company added support for GPU optimization which means the platform can now manage AI-native software stacks to deliver full-stack observability to monitor the performance of training and inference workloads running on large Nvidia accelerated computing clusters, HPE stated.
The new GPU optimization capability is available through HPE Private Cloud AI and standalone for extending across large clusters.
Specifically, OpsRamp can now identify worklaod imbalances, optimize job scheduling and ensure efficient resource utilization by monitoring GPU and CPU utilization across an AI cluster, wrote Taruna Gandhi, head of marketing for HPE OpsRamp Software, in a blog post.
“OpsRamp can proactively resolve potential issues by automating responses to certain events, such as reducing the GPU’s clock speed or even powering it down to prevent damage, scaling of resources based on workload demands, as well as automated patching and upgrading of operating systems and software on compute nodes,” Gandhi wrote.
It can also future resource needs and optimize resource allocation by analyzing historical performance and utilization data as well as monitor power consumption and resource utilization, which is especially critical in large AI deployments where energy costs can be substantial, Gandhi stated.
HPE AI servers
On the server front, HPE said it would release new servers utilizing Nvidia Blackwell Ultra and Nvidia Blackwell platforms.
For example, the new Nvidia GB300 NVL72 by HPE will support large customers looking to big, complex AI clusters capable of training trillion parameter models, the company stated. In addition, the company will add HPE ProLiant Compute XD servers to support Nvidia HGX B300 NVL16, which includes 16 Nvidia Blackwell GPUs for customers looking to train and fine-tune large AI models, the vendor stated.
Other new boxes include:
- HPE ProLiant Compute DL384b Gen12 with the Nvidia GB200 Grace Blackwell NVL4 Superchip aimed at AI workloads including scientific computing, graph neural network (GNN) training, and AI inference applications.
- HPE ProLiant Compute DL380a Gen12 with the new NVIDIA RTX PRO 6000 Blackwell Server Edition is a PCIe-based data center box aimed at supporting enterprise AI inferencing and visual computing workloads.
Source:: Network World