AMD FidelityFX™ Parallel Sort

Features

State-of-the-art algorithm

Optimized for Shader Model 6.0+

Open source, MIT license

Additional features:

Direct and Indirect execution support.

AMD RDNA™ architecture and above optimized algorithm.

32-bit key and payload sort support

Support for DirectX® 12 API and Vulkan®.

Shaders written in HLSL utilizing Shader Model 6.0 wave-level operations.

A sample application is provided for DirectX® 12 and Vulkan®.

Details

Algorithm overview

AMD FidelityFX™ Parallel Sort is an AMD RDNA™ architecture-optimized version of the Radix Sort algorithm.

At a high level, the algorithm works by recursing over a data set to be sorted (key or key/value pairs), and re-arranging it in place by 4-bit increments. Each pass guarantees that the data set is fully sorted up to the number of bits processed. For example, after 4 iterations, we are guaranteed that the first 16 bits of the key is properly sorted.

For each iteration that is executed, 5 actions are taken on the data set:

The 4-bit value range we are currently sorting is summed up into buckets from 0-15, so that we know how many of each value occurs throughout the data set.

The number of occurrences go through a reduction phase in order to pre-increment offsets on a thread group basis later on.

The reduced occurrences go through a scan-prefix to calculate offset values for each value group (0-15) on a thread group basis.

The full occurrences buffer then also goes through a scan-prefix, and adds the reduced scan-prefix values to properly index the data across all thread groups.

The data set is read in one more time, and written to its new sorted offset location. If there is also a payload, it is also copied over at this time.

Once all iterations have run (in the case of 32-bit keys, it runs 8 times), the entire data set is sorted.

Comparison: GPU particle sorting

Version history

The AMD FidelityFX™ SDK 1.1.4 is a patch release that includes additions to API and fixes for issues discovered with AMD FSR 3.1.0 to 3.1.3.
Exposed 4 new tunings to reduce AMD FSR upscaler ghosting in newly disoccludded pixels or highly reactive pixels.
Changed the default value of fMinDisocclusionAccumulation to -0.333 (from equivalent of 0.333 in AMD FSR 3.1.3) to reduce disocclusion ghosting.
Added ffxQueryGetProviderVersion to get version info from created ffx-api context.
Exposed ffxDispatchDescFrameGenerationPrepareCameraInfo as a linked struct. It is a required input to AMD FSR 3.1.4 and onwards for best quality.
Added frame generation debug checker support.
Dropped unused interpolation command lists if frame generation callback fails, to fix infinite wait at swapchain destruction.
General fixes to Vulkan® Frame Interpolation Swapchain.
General framework fixes and updates.
Frame pacing debug line added to Vulkan®.
Added new FFX error when create frame interpolation swapchain results in E_ACCESSDENIED in DX12 due to overlay or capture software.
Enabled support for frame interpolation swapchain on Windows® 10 1909 and potentially earlier versions.
Fixed flipped disocclusion factor from previous and current backbuffer.
Fixed HDR mode issues in Cauldron sample.
Fixed MSVC C compile errors including ffx-api.
The AMD FidelityFX™ SDK 1.1.4 also updates the following to address select issues:
AMD FidelityFX™ Brixelizer GI 1.0.1
AMD FidelityFX™ Breadcrumbs 1.0.1