In-Game GPU Profiling for DirectX 12 Using SetBackgroundProcessingMode

If you are a DirectX 12 (DX12) game developer, you may have noticed that GPU times displayed in real time in your game HUD may change over time for a given…

If you are a DirectX 12 (DX12) game developer, you may have noticed that GPU times displayed in real time in your game HUD may change over time for a given pass. This may be the case even if nothing has changed on the application side. 

One reason for GPU time variations may be GPU Boost dynamically changing the GPU core clock frequency. Still, even with GPU Boost disabled using the DX12 SetStablePowerState API, GPU timings measured in-game may still change unexpectedly from run to run, or from frame to frame. One factor to consider is whether background driver optimizations were engaged and when their resulting optimized shaders were deployed.

This post provides best practices for performing in-game GPU profiling while monitoring the state of the background driver optimizations, using the DX12 SetBackgroundProcessingMode API on NVIDIA GPUs.

Keep background driver optimizations always on

The DX12 driver automatically disables all of its background optimizations if it detects a risk that the CPU overhead may negatively impact the frame rate of the DX12 application. As a result, running with a Debug build of an application may result in less optimal GPU workloads, for instance. Even for a Release build, the driver background optimizations may be turned on and off dynamically from frame to frame.

To avoid getting inconsistent profiling results depending on the CPU load of your application, you can request that the driver background optimizations stay always on, even if it may degrade frame rate. Use the following call (once is enough–no need to redo for every frame):

if (FAILED(pDevice6->SetBackgroundProcessingMode(
  D3D12_BACKGROUND_PROCESSING_MODE_ALLOW_INTRUSIVE_MEASUREMENTS, 
  D3D12_MEASUREMENTS_ACTION_KEEP_ALL,
  nullptr, nullptr)) {
        // handle error.
      }

Wait for background driver optimization threads

Even with driver background optimizations always on, the optimizations typically require multiple frames to collect observations. The observations are then used to compile a shader asynchronously. In contrast, DX12 Create calls block for compiles. This asynchronous delivery of new binaries can result in GPU performance for one shader suddenly changing from one frame to the next without anything changing on the application side.

Understandably, this can cause a great deal of confusion in timing your shaders. You should still aim to measure these background-optimized shaders to avoid application optimization work that the driver is already providing.
To know when all background driver optimizations have completed so you can take GPU performance measurements in your in-game profiler, use the following code on Present. Continue to render frames until wantMoreFrames is returned as false.

On Present:

BOOL wantMoreFrames;
if (FAILED(pDevice6->SetBackgroundProcessingMode(
    D3D12_BACKGROUND_PROCESSING_MODE_ALLOW_INTRUSIVE_MEASUREMENTS,
    D3D12_MEASUREMENTS_ACTION_KEEP_ALL,
    nullptr,
    &wantMoreFrames))) {
        // handle error.
    }

Notes:

  • The wantMoreFrames return value combines two pieces of information from the driver: “are  background compiles currently running” and “does the driver want more frames demonstrated to the optimizers.”
  • We recommend that you display this Boolean in real time in your game HUD next to your in-game GPU timings.
  • It is possible that wantMoreFrames never becomes false if the driver continues generating new binaries. We recommend that you pause your game time and do not move the camera to avoid this possibility.
  • If the wantMoreFrames Boolean never turns false in your case, even after you have paused all simulations, you can fall back to looking at whether the GPU timings in your game HUD appear to have settled.

Reset the background processing mode to the default mode

Use the following call to return to the default mode of the DX12 driver. In this mode, the driver turns background optimizations on and off depending on internal heuristics.

if (FAILED(pDevice6->SetBackgroundProcessingMode(
  D3D12_BACKGROUND_PROCESSING_MODE_ALLOWED, 
  D3D12_MEASUREMENTS_ACTION_KEEP_ALL,
  nullptr, nullptr)) {
        // handle error.
      }

Conclusion

For more deterministic performance measurements on NVIDIA GPUs using your DX12 in-game GPU profiler, we recommend that you display the wantMoreFrames Boolean in your game HUD next to your in-game GPU timings to know whether background driver optimizations are in flight.

By using the DX12 SetBackgroundProcessingMode API in your game engine in this way during development, your in-game GPU profiler will provide more reliable information. By using the ALLOW_INTRUSIVE_MEASUREMENTS background processing mode, you should no longer get different GPU timings depending on the CPU load of your game. By waiting for wantMoreFrames to be false, you can make sure that you always look at the GPU performance of the fully optimized shaders.

Source:: NVIDIA