Advanced API Performance: Swap Chains

A graphic of a computer sending code to multiple stacks.

Swap chains are an integral part of how you get rendering data output to a screen. They usually consist of some group of output-ready buffers, each of which can…

Swap chains are an integral part of how you get rendering data output to a screen. They usually consist of some group of output-ready buffers, each of which can be rendered to one at a time in rotation. In parallel with rendering to one of a swap chain’s buffers, some other buffer in the swap chain is generally read from for display output.

This post covers best practices when working with swap chains on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.

It’s common to focus on the more frequently optimized parts of the rendering pipeline when looking to improve rendering performance. However, swap chains are often overlooked, leaving potential performance and latency on the table. 

The following suggestions and considerations should provide more insight into the best ways to ensure optimum swap chain performance.

Recommended

  • Use flip-mode swap chains. This is especially important for leveraging multiplane overlay support, which provides fullscreen-like performance and latency when running in windowed mode. 
  • Use SetFullScreenState(TRUE), a (borderless) fullscreen window, and a non-windowed flip model swap chain to switch to true immediate independent flip mode.
    • This is the only mode that enables unlimited framerates with tearing for Direct 3D 12 when calling Present(0,0).
    • For proper unlimited frame rate support for displays that support variable refresh rates, you must also use the DXGI_SWAP_CHAIN_FLAG_ALLOW_TEARING swap chain flag, along with the DXGI_PRESENT_ALLOW_TEARING Present flag
  • Use the DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH flag consciously.
    • The flag is not necessary to achieve unlimited frame rates (see earlier note) if your window size matches the current screen resolution
    • If this flag is set, trying to change the resolution using ResizeTarget before calling SetFullScreenState(TRUE) works fine and the framerate will be unlimited
    • If this flag is not set, trying to change resolution using ResizeTarget before calling SetFullScreenState(TRUE) results in no change of display resolution. Your target is stretched to the current resolution and the frame rate is limited.
  • If not in the fullscreen state (true immediate independent flip mode), control your latency and buffer count in your swap chain carefully for the desired frame rate and latency.
    • Use IDXGISwapChain2::SetMaximumFrameLatency(MaxLatency) to set the desired latency, where MaxLatency is some number of frames (counted by queued Present calls).
    • For this to work, you must create your swap chain with the DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT flag set.
    • DXGI starts to block in Present after you have the MaxLatency number of present calls queued.
    • In this windowed state, a sync interval of 0 in a Present call ensures that the frame being presented is the most recent frame available for the next time desktop composition happens (combining the windowed rendered frame with the rest of the desktop), and all previously completed frames are discarded in favor of this latest one. No rendered frame is displayed until composition happens, which happens at VSYNC time. This latest finished frame is what is displayed
  • Use about 1-2 more swap chain buffers than the maximum number of frames that you intend to queue (in terms of command allocators, dynamic data, and the associated frame fences). Set the maximum frame latency to this number of swap chain buffers through IDXGISwapChain2::SetMaximumFrameLatency(MaxLatency).
    • This ensures that you can limit queued frames and latency explicitly and optimally from within the application logic rather than relying on the OS to block or have it block at an unexpected time.

Not recommended

  • Forgetting that, by default, there’s a per–swap chain limit of three queued frames before DXGI starts to block in Present. This means that it blocks on the fourth Present call if there are currently three Present calls queued.
    • Set the DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT flag on swap chain creation and use IDXGISwapChain2::SetMaximumFrameLatency to modify this default value
  • Forgetting to call ResizeBuffers after you have switched to true immediate independent flip mode using SetFullScreenState(TRUE).

Acknowledgments

Thanks to Cody Robson, Kumaresan Gnanasekaran, Adrian Muntianu, and Meenal Nachnani for their advice and assistance.

Source:: NVIDIA