Advanced API Performance: Synchronization

A graphic of a computer sending code to multiple stacks.

Synchronization in graphics programming refers to the coordination and control of concurrent operations to ensure the correct and predictable execution of…

Synchronization in graphics programming refers to the coordination and control of concurrent operations to ensure the correct and predictable execution of rendering tasks. Improper synchronization across the CPU and GPU can lead to slow performance, race conditions, and visual artifacts.

Recommended

  • If running workloads asynchronously, make sure that they stress different GPU units. For example, pair bandwidth-heavy tasks with math-heavy tasks. That is, use z-prepass and BVH build or post-processing.
  • Always verify whether the asynchronous implementation is faster across the different architectures.
  • Asynchronous work can belong to different frames. Using this technique can help find better-paired workloads.
  • Wait and signal the absolute minimum of semaphores/fences. Every excessive semaphore/fence can introduce a bubble in a pipeline.
  • Use GPU profiling tools (NVIDIA Nsight Graphics in GPU trace mode, PIX, or GPUView) to see how well work overlaps and fences play together without stalling one queue or another.
  • To avoid extra synchronizations and resource barriers, asynchronous copy/transfer work can be done in compute queue.

Not recommended

  • Do not create queues that you don’t use.
    • Each additional queue adds processing overhead.
    • Multiple asynchronous compute queues will not overlap, due to the OS scheduler, unless hardware scheduling is enabled. For more information, see Hardware Accelerated GPU Scheduling.
  • Avoid tiny asynchronous tasks and group them if possible. Asynchronous workloads that take

    Source:: NVIDIA