Confidential Computing on NVIDIA H100 GPUs for Secure and Trustworthy AI

Hardware virtualization is an effective way to isolate workloads in virtual machines (VMs) from the physical hardware and from each other. This offers improved…

Hardware virtualization is an effective way to isolate workloads in virtual machines (VMs) from the physical hardware and from each other. This offers improved security, particularly in a multi-tenant environment. Yet, security risks such as in-band attacks, side-channel attacks, and physical attacks can still happen, compromising the confidentiality, integrity, or availability of your data and applications.

Until recently, protecting data was limited to data-in-motion, such as moving a payload across the Internet, and data-at-rest, such as encryption of storage media. Data-in-use, however, remained vulnerable.

NVIDIA Confidential Computing offers a solution for securely processing data and code in use, preventing unauthorized users from both access and modification. When running AI training or inference, the data and the code must be protected. Often the input data includes personally identifiable information (PII) or enterprise secrets, and the trained model is highly valuable intellectual property (IP). Confidential computing is the ideal solution to protect both AI models and data.

NVIDIA is at the forefront of confidential computing, collaborating with CPU partners, cloud providers, and independent software vendors (ISVs) to ensure that the change from traditional, accelerated workloads to confidential, accelerated workloads will be smooth and transparent.

The NVIDIA H100 Tensor Core GPU is the first ever GPU to introduce support for confidential computing. It can be used in virtualized environments, either with traditional VMs or in Kubernetes deployments, using Kata to launch confidential containers in microVMs.

This post focuses on the traditional virtualization workflow with confidential computing.

NVIDIA Confidential Computing using hardware virtualization

Confidential computing is the protection of data in use by performing computation in a hardware-based, attested trusted execution environment (TEE), per the Confidential Computing Consortium.

The NVIDIA H100 GPU meets this definition as its TEE is anchored in an on-die hardware root of trust (RoT). When it boots in CC-On mode, the GPU enables hardware protections for code and data. A chain of trust is established through the following:

A GPU boot sequence, with a secure and measured boot
A security protocols and data models (SPDM) session to securely connect to the driver in a CPU TEE
The generation of a cryptographically signed set of measurements called an attestation report.

The user of the confidential computing environment can check the attestation report and only proceed if it is valid and correct.

Secure AI across hardware, firmware, and software

NVIDIA continues to improve the security and integrity of its GPUs in each generation. Since the NVIDIA Volta V100 Tensor Core GPU, NVIDIA has provided AES authentication on the firmware that runs on the device. This authentication ensures that you can trust that the bootup firmware was neither corrupted nor tampered with.

Through the NVIDIA Turing architecture and the NVIDIA Ampere architecture, NVIDIA added additional security features including encrypted firmware, firmware revocation, fault injection countermeasures, and now, in NVIDIA Hopper, the on-die RoT, and measured/attested boot.

To achieve confidential computing on NVIDIA H100 GPUs, NVIDIA needed to create new secure firmware and microcode, and enable confidential computing capable paths in the CUDA driver, and establish attestation verification flows. This hardware, firmware, and software stack provides a complete confidential computing solution that includes the protection and integrity of both code and data.

With the release of CUDA 12.2 Update 1, the NVIDIA H100 Tensor Core GPU, the first confidential computing GPU, is ready to run confidential computing workloads with our early access release.

Hardware security for NVIDIA H100 GPUs

The NVIDIA Hopper architecture was first brought to market in the NVIDIA H100 product, which includes the H100 Tensor Core GPU chip and 80 GB of High Bandwidth Memory 3 (HBM3) on a single package. There are multiple products using NVIDIA H100 GPUs that can support confidential computing, including the following:

NVIDIA H100 PCIe
NVIDIA H100 NVL
NVIDIA HGX H100

There are three supported confidential computing modes of operation:

CC-Off: Standard NVIDIA H100 operation. None of the confidential computing-specific features are active.
CC-On: The NVIDIA H100 hardware, firmware, and software have fully activated all the confidential computing features. All firewalls are active, and all performance counters have been disabled to prevent their use in side-channel attacks.
CC-DevTools: Developers count on NVIDIA Developer Tools to help profile and trace their code so that they can understand system bottlenecks to improve overall performance. In CC-DevTools mode, the GPU is in a partial CC mode that will match the workflows of CC-On mode, but with security protections disabled and performance counters enabled. This enables the NSys Trace tool to run and help resolve any performance issues seen in CC-On mode.

The controls to enable or disable confidential computing are provided as in-band PCIe commands from the hypervisor host.

Operating NVIDIA H100 GPUs in confidential computing mode

NVIDIA H100 GPU in confidential computing mode works with CPUs that support confidential VMs (CVMs). CPU-based confidential computing enables users to run in a TEE, which prevents an operator with access to either the hypervisor, or even the system itself, from access to the contents of memory of the CVM or confidential container. However, extending a TEE to include a GPU introduces an interesting challenge, as the GPU is blocked by the CPU hardware from directly accessing the CVM memory.

To solve this, the NVIDIA driver, which is inside the CPU TEE, works with the GPU hardware to move data to and from GPU memory. It does so through an encrypted bounce buffer, which is allocated in shared system memory and accessible to the GPU. Similarly, all command buffers and CUDA kernels are also encrypted and signed before crossing the PCIe bus.

After the CPU TEE’s trust has been extended to the GPU, running CUDA applications is identical to running them on a GPU with CC-Off. The CUDA driver and GPU firmware take care of the required encryption workflows in CC-On mode transparently.

Specific CPU hardware SKUs are required to enable confidential computing with the NVIDIA H100 GPU. The following CPUs have the required features for confidential computing:

All AMD Genoa or Milan CPUs have Secure Encrypted Virtualization with Secure Nested Paging (SEV-SNP) enabled
Intel Sapphire Rapids CPUs use Trusted Domain eXtensions (TDX), which is in early access and only enabled for select customers.

NVIDIA has worked extensively to ensure that your CUDA code “Just Works” with confidential computing enabled. When these steps have been taken to ensure that you have a secure system with proper hardware, drivers, and a passing attestation report, your CUDA applications should run without any changes.

Specific hardware and software versions are required to enable confidential computing for the NVIDIA H100 GPU. The following table shows an example stack that can be used with our first release of software.

ComponentVersionCPUAMD Milan+GPUH100 PCIeSBIOSASRockRack: BIOS Firmware Version L3.12C or later
Supermicro: BIOS Firmware Version 2.4 or later
For other servers, check with the manufacturer for the minimum SBIOS to enable confidential computing.HypervisorUbuntu KVM/QEMU 22.04+OSUbuntu 22.04+Kernel5.19-rc6_v4 (Host and guest)qemu>= 6.1.50 (branch – snp-v3)ovmf>= commit (b360b0b589)NVIDIA VBIOSVBIOS version: 96.00.5E.00.01 and laterNVIDIA DriverR535.86Table 1. Confidential computing for NVIDIA H100 GPU software and hardware stack example

Table 1 provides a summary of hardware and software requirements. For more information about using nvidia-smi, as well as various OS and BIOS level settings, see the NVIDIA Confidential Computing Deployment Guide.

Benefits of NVIDIA Hopper H100 Confidential Computing for trustworthy AI

The confidential computing capabilities of the NVIDIA H100 GPU provide enhanced security and isolation against the following in-scope threat vectors:

Software attacks
Physical attacks
Software rollback attacks
Cryptographical attacks
Replay attacks

Because of the NVIDIA H100 GPUs’ hardware-based security and isolation, verifiability with device attestation, and protection from unauthorized access, an organization can improve the security from each of these attack vectors. Improvements can occur with no application code change to get the best possible ROI.

In the following sections, we discuss how the confidential computing capabilities of the NVIDIA H100 GPU are initiated and maintained in a virtualized environment.

Hardware-based security and isolation on virtual machines

To achieve full isolation of VMs on-premises, in the cloud, or at the edge, the data transfers between the CPU and NVIDIA H100 GPU are encrypted. A physically isolated TEE is created with built-in hardware firewalls that secure the entire workload on the NVIDIA H100 GPU.

The confidential computing initialization process for the NVIDIA H100 GPU is multi-step.

Enable CC mode:

The host requests enabling CC mode persistently.
The host triggers the GPU reset for the mode to take effect.

Boot the device:

GPU firmware scrubs the GPU state and memory.
GPU firmware configures a hardware firewall to prevent unauthorized access and then enables PCIe.

Initialize the tenant:

The GPU PF driver uses SPDM for session establishment and the attestation report.
The tenant attestation service gathers measurements and the device certificate using NVML APIs.
CUDA programs are permitted to use the GPU.

Shut down the tenant:

The host triggers a physical function level reset (FLR) to reset the GPU and returns to the device boot.
GPU firmware scrubs the GPU state and memory.

Figure 1. NVIDIA H100 Confidential Computing initialization process

Figure 1 shows that the hypervisor can set the confidential computing mode of the NVIDIA H100 GPU as required during provisioning. The APIs to enable or disable confidential computing are provided as both in-band PCIe commands from the host and out-of-band BMC commands.

Verifiability with device attestation

Attestation is the process where users, or the relying party, want to challenge the GPU hardware and its associated driver, firmware, and microcode, and receive confirmation that the responses are valid, authentic, and configured correctly before proceeding.

Before a CVM uses the GPU, it must authenticate the GPU as genuine before including it in its trust boundary. It does this by retrieving a device identity certificate (signed with a device-unique ECC-384 key pair) from the device or calling the NVIDIA Device Identity Service. The device certificate can be fetched by the CVM using nvidia-smi.

Verification of this certificate against the NVIDIA Certificate Authority will verify that the device was manufactured by NVIDIA. The device-unique, private identity key is burned into the fuses of each H100 GPU. The public key is retained for the provisioning of the device certificate.

In addition, the CVM must also ensure that the GPU certificate is not revoked. This can be done by calling out to the NVIDIA Online Certificate Service Protocol (OCSP).

We provide the NVIDIA Remote Attestation Service (NRAS) as the primary method of validating GPU attestation reports. You also have the option to perform local verification for air-gapped situations. Of course, stale local data regarding revocation status or integrity of the verifier may still occur with local verification.

No application code changes

Leverage all the benefits of confidential computing with no code changes required to your GPU-accelerated workloads in most cases. Use NVIDIA GPU-optimized software to accelerate end-to-end AI workloads on H100 GPUs while maintaining security, privacy, and regulatory compliance. When these steps have been taken to ensure that you have a secure system, with proper hardware, drivers, and a passing attestation report, executing your CUDA application should be transparent to you.

Accelerated computing performance with confidential computing

NVIDIA GPU Confidential Computing architecture is compatible with those CPU architectures that also provide application portability from non-confidential to confidential computing environments.

It should not be surprising that confidential computing workloads on the GPU perform close to non-confidential computing mode when the amount of compute is large compared to the amount of input data.

When the compute per input data bytes is low, the overhead of communicating across non-secure interconnects limits the application throughput. This is because the basics of accelerated computing remain unchanged when running CUDA applications in confidential computing mode.

In confidential computing mode, the following performance primitives are at par with non-confidential mode:

GPU raw compute performance: The compute engines execute plaintext code on plaintext data resident in GPU memory.
GPU memory bandwidth: The on-package HBM memory is considered secure against everyday physical attack tools and is not encrypted.

The following performance primitives are impacted by additional encryption and decryption overheads:

CPU-GPU interconnect bandwidth: It is limited by CPU encryption performance, which we currently measure at roughly 4 GBytes/sec.
Data transfer throughput across the non-secure interconnects: This primitive incurs the latency overhead of encrypted bounce buffers in unprotected memory used to stage the confidential data.

Figure 2. Example topology of four GPU systems with GPU confidential computing configuration

There is an additional overhead of encrypting GPU command buffers, synchronization primitives, exception metadata, and other internal driver data exchanged between the GPU and the confidential VM running on the CPU. Encrypting these data structures prevents side-channel attacks on the user data.

CUDA Unified Memory has long been used by developers to use the same virtual address pointer from the CPU and the GPU, greatly simplifying application code. In confidential computing mode, the unified memory manager encrypts all pages being migrated across the non-secure interconnect.

Secure AI workloads with early-access confidential computing for NVIDIA H100

Confidential computing offers a solution for securely protecting data and code in use while preventing unauthorized users from both access and modification. The NVIDIA Hopper H100 PCIe or HGX H100 8-GPU now includes confidential computing enablement as an early access feature.

To get started with confidential computing on NVIDIA H100 GPUs, configuration steps, supported versions, and code examples are covered in Deployment Guide for Trusted Environments. The NVIDIA Hopper H100 GPU has several new hardware-based features that enable this level of confidentiality and interoperates with CVM TEEs from the major CPU vendors. For more information, see the Confidential Compute on NVIDIA Hopper H100 whitepaper.

Because of the NVIDIA H100 GPU’s hardware-based security and isolation, verifiability through device attestation, and protection from unauthorized access, customers and end users can improve security with no application code changes.

Source:: NVIDIA