NVIDIA Operators streamline installing and managing GPUs and NICs on Kubernetes to make the software stack ready to run the most resource-demanding workloads, such as AI, ML, DL, and HPC, in the cloud, data center, and at the edge.
Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management. It’s an extremely popular tool, and can be used for automated rollouts and rollbacks, horizontal scaling, storage orchestration, and more. For many organizations, Kubernetes is a key component to their infrastructure.
A critical step to installing and scaling Kubernetes is ensuring that it is properly utilizing the other components of the infrastructure. NVIDIA Operators streamline installing and managing GPUs and NICs on Kubernetes to make the software stack ready to run the most resource-demanding workloads, such as AI, ML, DL, and HPC, in the cloud, data center, and at the edge. NVIDIA Operators consist of the GPU Operator and the Network Operator, and are open source and based on the Operator Framework.
NVIDIA GPU Operator
The NVIDIA GPU Operator is packaged as a Helm Chart and installs and manages the lifecycle of software components so that the GPU-accelerated applications can be run on Kubernetes. The components are the GPU feature discovery, the NVIDIA Driver, the Kubernetes Device Plugin, the NVIDIA Container Toolkit, and DCGM Monitoring.
The GPU Operator enables infrastructure teams to manage the lifecycle of GPUs when used with Kubernetes at the Cluster level, therefore eliminating the need to manage each node individually. Previously infrastructure teams had to manage two operating system images, one for GPU nodes and one CPU nodes. When using the GPU Operator, infrastructure teams can use the CPU image with GPU worker nodes as well.
NVIDIA Network Operator
The Network Operator is responsible for automating the deployment and management of the host networking components in a Kubernetes cluster. It includes the Kubernetes Device Plugin, NVIDIA Driver, NVIDIA Peer Memory Driver, and the Multus, macvlan CNIs. These components were previously installed manually, but are automated through the Network Operator, streamlining the deployment process and enabling accelerated computing with enhanced customer experience.
Used independently or together, NVIDIA Operators simplify GPU and SmartNIC configurations on Kubernetes and are compatible with partner cloud platforms. To learn more about these components and how the NVIDIA Operators solve the key challenges to running AI, ML, DL, and HPC workloads and simplify initial setup and Day 2 operations, check out the on-demand webinar “Accelerating Kubernetes with NVIDIA Operators“.