Advanced API Performance: CPUs

Decorative ray tracing image with post title.

This post covers CPU best practices when working with NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API…

This post covers CPU best practices when working with NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.

To get the best performance from your NVIDIA GPU, pair it with efficient work delegation on the CPU. Frame-rate caps, stutter, and other subpar application performance events can often be traced back to a bottleneck on the CPU. Use the following tips to understand what you should do and what to avoid.

Multithreading and workload balancing

No amount of GPU work optimization will overcome a CPU bottleneck. Evenly balance work across all threads for best results.

Not recommended

CPU-intensive command lists should not be recorded on the same thread on which ExecuteCommandLists is called. Typically, ExecuteCommandLists is serialized after command list recording for a given frame. Keeping that on a separate thread from all other command list recording threads enables subsequent frame CPU work to begin with less-complicated load balancing.
Fine-grained query use adds CPU overhead, for example on timing around draw calls.

ExecuteCommandLists and multiple command queues

ExecuteCommandLists submits an array of command lists (ECL) to the GPU for execution. NVIDIA hardware supports multiple command queues to parallelize graphics work, enabling graphics-compute or compute-compute work to be performed concurrently.

Resource allocation and destruction

Creating and destroying buffers, textures, and shaders is fundamental to efficient computer graphics.

Use a dedicated thread for resource creation to avoid hidden OS costs and blocking frame rendering, as this can result in costly OS paging work.
Free threaded resource creation may also be a natural fit for async copy queue uploads, which would enable completely free threaded data uploads to vidmem for newly allocated resources. Structuring uploads this way avoids adding hidden overhead to frame rendering. However, be aware that additional queues and synchronization between queues may also add CPU overhead.

BuildRaytracingAccelerationStructure

Ray tracing acceleration structures are data structures that organize the geometric information of a scene to optimize the intersection tests between rays and scene objects. BuildRaytracingAccelerationStructure performs the initial construction of the acceleration structure with the scene geometry.

Record on a separate thread when using BuildRaytracingAccelerationStructure, preferably scheduling on an async compute queue. This API is CPU-intensive and can dominate command list recording time.
Be wary of CPU overhead directly related to geometric complexity for full builds. Rebuilds should be relatively fixed overhead.
Be aware of the extra CPU overhead associated with FAST_TRACE builds.

For more information, see Best Practices: Using NVIDIA RTX Ray Tracing.

CreatePipelineState and CreateStateObject

CreatePipelineState is used to create a rendering pipeline state object that defines the configuration of the graphics pipeline. The pipeline state object encapsulates all of the state required to execute a graphics command, such as the input layout, shader programs, blending state, depth-stencil state, and rasterizer state.

CreateStateObject enables developers to create a state object that encapsulates the state of the graphics pipeline as a whole. The state object includes the pipeline state object created using CreatePipelineState, as well as other state information such as the viewport, scissor rectangle, and input layout.

Not recommended

Avoid needlessly creating pipeline state objects and ray tracing objects. These involve shader creation, which can consume substantial CPU cycles. Shader complexity directly affects creation-call complexity.

Source:: NVIDIA

Advanced API Performance: CPUs

Multithreading and workload balancing

Recommended

Not recommended

ExecuteCommandLists and multiple command queues

Recommended

Resource allocation and destruction

Recommended

BuildRaytracingAccelerationStructure

Recommended

CreatePipelineState and CreateStateObject

Recommended

Not recommended