Ray-Tracing Validation at the Driver Level

Decorative image of code block with green lightbeams shining on a figure walking on a computer chip between monitors.

For developers working on Microsoft DirectX ray-tracing applications, ray-tracing validation is here to help you improve performance, find hard-to-debug issues,…

For developers working on Microsoft DirectX ray-tracing applications, ray-tracing validation is here to help you improve performance, find hard-to-debug issues, and root cause crashes. 

Unlike existing debug solutions, ray-tracing validation performs checks at the driver level, which enables it to identify potential problems that cannot be caught by tools such as the D3D12 Debug Layer. Warnings and errors are delivered straight from the driver to the application with a callback, where they can be processed through existing application-side debugging or logging systems. 

I highly recommend enabling ray-tracing validation during any feature development and functional testing wherever possible.

Requirements

Ray-tracing validation requires an NVIDIA driver from version 551.61 and later, as well as NVIDIA API (NvAPI).

Driver versions between 545.00 and 551.61 also support ray-tracing validation. However, validation can be prohibitively slow in these earlier versions. We recommend upgrading to driver 551.61 or later.

Enabling ray-tracing validation

To enable ray-tracing validation, set the NV_ALLOW_RAYTRACING_VALIDATION=1 environment variable on your development machine.

Next, create a D3D12 device and then initialize NVAPI.

Enable ray-tracing validation on this device with the NvAPI_D3D12_EnableRaytracingValidation function.

Customize the NVAPI_D3D12_RAYTRACING_VALIDATION_MESSAGE_CALLBACK function to output to your debugger or logging system of choice and then register it with NvAPI_D3D12_RegisterRaytracingValidationMessageCallback.

Flush validation messages to the registered callback after fence signals or in your device-removal handler with NvAPI_D3D12_FlushRaytracingValidationMessages.

Performance optimization

Running an application with ray-tracing validation incurs a certain performance cost. As such, applications should never be shipped with ray-tracing validation enabled. The NV_ALLOW_RAYTRACING_VALIDATION=1 environment variable serves as a safeguard against end users experiencing any performance downside from this happening accidentally.

NvAPI_D3D12_FlushRaytracingValidationMessages reports any validation messages to the registered callback for work that has been completed on the GPU at the time of the call. This enables you to control the granularity of the messages reported. Crucially, explicit flushing also enables the processing of validation messages even after a device removal error.

Application example

The following code example shows how to enable and use ray-tracing validation in an application using NvAPI: 

// Validation callback
static void __stdcall myValidationMessageCallback(void* pUserData, NVAPI_D3D12_RAYTRACING_VALIDATION_MESSAGE_SEVERITY severity, const char* messageCode, const char* message, const char* messageDetails)
{
          const char* severityString = "unknown";
          switch (severity)
          {
          case NVAPI_D3D12_RAYTRACING_VALIDATION_MESSAGE_SEVERITY_ERROR: severityString = "error"; break;
          case NVAPI_D3D12_RAYTRACING_VALIDATION_MESSAGE_SEVERITY_WARNING: severityString = "warning"; break;
          }
          fprintf(stderr, "Ray Tracing Validation message: %s: [%s] %sn%s", severityString, messageCode, message, messageDetails);
          fflush(stderr);
}
 
// Enable Ray Tracing Validation
void onCreate()
{
          NvAPI_Initialize();
          ID3D12Device* device = MyCreateD3DDevice();
          if (validationMode) {
                    NvAPI_D3D12_EnableRaytracingValidation(device,          NVAPI_D3D12_RAYTRACING_VALIDATION_FLAG_NONE);
          NvAPI_D3D12_RegisterRaytracingValidationMessageCallback(device, &myValidationMessageCallback, (void*)&myCallbackData, &nvapiValidationCallbackHandle);
          }
}
 
// Flush the validation message after a fence signal
void waitForGPU(UINT64 fenceValue)
{
          fence->SetEventOnCompletion(fenceValue, fenceEvent);
          WaitForSingleObjectEx(fenceEvent, INFINITE, FALSE);
          if (validationMode)
                    NvAPI_D3D12_FlushRaytracingValidationMessages(device);
}
 
// We highly recommend checking for DXGI_ERROR_DEVICE_REMOVED and flushing the validation message if this occurs
void onDeviceRemoved()
{
          // flushing after DXGI_ERROR_DEVICE_REMOVED is OK
          if (validationMode)
                 NvAPI_D3D12_FlushRaytracingValidationMessages(device);
}

What is being validated?

Ray-tracing validation reports certain error conditions occurring within DispatchRays and BuildRaytracingAccelerationStructure calls that can lead to hard-to-debug faults or corruptions. 

For example, an application might build a new ray-tracing pipeline for the next frame and update the shader binding table (SBT) that is still in use for the current frame. This could lead to a potential fault on missing entries in the SBT in the current frame. Ray-tracing validation reports an UNKNOWN_ENTRY_FUNCTION error along with error-specific details, guiding you to the root cause of the fault.

Validation checks for DispatchRays include the following:

  • Unexpected SBT record shader type, like encountering a miss shader entry when expecting a hit group.
  • SBT references a shader that is not part of the pipeline.
  • Out-of-bounds SBT entry.
  • Shader payload type mismatch, when the invoked shader expects a type different than that passed to TraceRay.
  • Maximum trace depth exceeded.
  • Stack overflow.

Validation checks for BuildRaytracingAccelerationStructure include the following:

  • Performance warnings for inefficient acceleration structures.
  • Excessive degenerated triangle use in re-fittable, bottom-level ASs. This may lead to poor performance.
  • Bad vertex data, such as NaNs and large numbers.
  • Ill-conditioned geometry or instance transforms.
  • Incomplete source acceleration structures used for refitting, copying or TLAS builds. Likely an app-side issue with proper syncing between AS operations.
  • Vertex, OMM, or DMM input index out-of-bounds check.
  • Altered flags between AS build and refit.

Example output

At the time of publication, here are a few examples of the driver’s validation output. The errors that are caught by the driver will evolve and grow over time.

error: [UNKNOWN_ENTRY_FUNCTION] attempted to execute a shader that is not part of the pipeline
	launch index: [451, 309, 0]
	additional occurrences: 12
	SBT byte offset (if applicable): 1088
	SBT range: likely hitgroup/raygen/callable

error: [UNEXPECTED_SHADER_TYPE] encountered a shader-binding-table record with unexpected shader type
	launch index: [0, 0, 0]
	additional occurrences: 10737
	type: hitgroup
	expected: miss
	SBT byte offset: 64
	SBT GPUVA: 0xb2841c0

error: [HIT_SBT_OUT_OF_BOUNDS] encountered an out-of-bounds access in the shader binding table when accessing a hitgroup record
	launch index: [826, 122, 0]
	additional occurrences: 1067
	SBT byte offset: 7040
	SBT byte size: 6528
	SBT index: 110
	instance SBT base index: 108

warning: [EXCESSIVE_DEGENERATE_PRIMITIVES] Acceleration structure has a significant portion of degenerated primitives, which can lead to poor performance.

Performance

Enabling ray-tracing validation comes at a performance cost. However, this cost is typically quite low, especially when no errors are detected. The performance impact should be acceptable for day-to-day feature development and debugging. As already mentioned, applications should not ship with validation mode enabled.

In the most recent tests against an extensive collection of benchmarks, we observed a performance impact of roughly 3% frame time on average for cases that did not trigger any errors. With benchmarks that did trigger errors, performance overhead ranged from 3% to 40%.

 Ray-tracing validation has no effect on ray query–based ray tracing from compute or pixel shaders.

Conclusion

Ray-tracing validation is a powerful tool for debugging troublesome crashes and improving ray-tracing performance. The changes required to enable it are small and simple. 

If you are developing a ray-tracing application, consider adding it to your development ecosystem today. For more information, see the ray-tracing forum.

Source:: NVIDIA