Nvidia releases reference architectures for AI factories

Nvidia has been talking about AI factories for some time, and now it’s coming out with some reference designs to help build them. The chipmaker has released a series of what it calls Enterprise Reference Architectures (Enterprise RA), which are blueprints to simplify the building of AI-oriented data centers.

Building an AI-oriented data center is no easy task, even by data center construction standards. And for most organizations, this would be the first time building such a facility. After all, who has built AI factories before? The idea behind Nvidia’s Enterprise RAs is to minimize the pain of building these infrastructures and to help organizations ensure their AI factories can evolve and scale up in the future.

A reference architecture provides the full-stack hardware and software recommendations. Bob Pette, vice president and general manager for enterprise platforms, said in a blog post that each Enterprise RA covers Nvidia-certified server configurations, AI-optimized networking through its Spectrum-X AI Ethernet network and BlueField-3 DPUs, and invidious AI Enterprise software base for production AI.

The one thing that the reference architecture does not cover is storage, since Nvidia does not supply storage. Instead, storage hardware and software is left to Nvidia’s certified server partners, such as Dell Technologies, Pure Storage, and NetApp.

Solutions based upon Nvidia’s Enterprise RAs are available from its partners, including Cisco, Dell, HPE, Lenovo and Supermicro, with 23 certified data center partners and 577 systems listed in Nvidia’s catalog.

On the software side is Nvidia’s AI Enterprise platform, which includes microservices such as Nvidia NeMo and Nvidia NIM for building and deploying AI applications, along with Nvidia Base Command Manager Essentials, which provides tools for infrastructure provisioning, workload management and resource monitoring.

Obviously, the biggest benefit of using Nvidia’s reference architectures is being able to get up and running faster, as customers have instructions laid out for them rather than having to figure things out for themselves.

However, there is another advantage, and that has to do with scale. Nvidia says the reference architectures are designed in such a way that they can easily be upgraded when new hardware and software becomes available.

“Enterprise RAs reduce the time and cost of deploying AI infrastructure solutions by providing a streamlined approach for building flexible and cost-effective accelerated infrastructure,” Pette said in his blog.

Source:: Network World