AMD FidelityFX Parallel Sort provides an open source header implementation to easily integrate a highly optimized compute-based radix sort into your game.
Download the latest version v1.1:
This release adds the following features:
- Vulkan® implementation.
- Upgraded framework to Cauldron v1.4.
- General code cleanup and readability improvements.
Optimized for Shader Model 6.0+
Open source, MIT license
- Direct and Indirect execution support.
- RDNA™+ optimized algorithm.
- 32-bit key and payload sort support
- Support for DirectX®12 API and Vulkan®.
- Shaders written in HLSL utilizing Shader Model 6.0 wave-level operations.
A sample application is provided for DirectX®12 and Vulkan®.
How it works
AMD FidelityFX Parallel Sort is an RDNA™-optimized version of the Radix Sort algorithm.
At a high level, the algorithm works by recursing over a data set to be sorted (key or key/value pairs), and re-arranging it in place by 4-bit increments. Each pass guarantees that the data set is fully sorted up to the number of bits processed. For example, after 4 iterations, we are guaranteed that the first 16 bits of the key is properly sorted.
For each iteration that is executed, 5 actions are taken on the data set:
- The 4-bit value range we are currently sorting is summed up into buckets from 0-15, so that we know how many of each value occurs throughout the data set.
- The number of occurrences go through a reduction phase in order to pre-increment offsets on a thread group basis later on.
- The reduced occurrences go through a scan-prefix to calculate offset values for each value group (0-15) on a thread group basis.
- The full occurrences buffer then also goes through a scan-prefix, and adds the reduced scan-prefix values to properly index the data across all thread groups.
- The data set is read in one more time, and written to its new sorted offset location. If there is also a payload, it is also copied over at this time.
Once all iterations have run (in the case of 32-bit keys, it runs 8 times), the entire data set is sorted.
Comparison: GPU particle sorting
Comparison: image index buffer sorting
More AMD FidelityFX effects
AMD FidelityFX – Super Resolution 2 (FSR 2)
Learn even more about our new open source temporal upscaling solution FSR 2, and get the source code and documentation!
AMD FidelityFX – Super Resolution 1.0
AMD FidelityFX Super Resolution (FSR) is our open-source, high-quality, high-performance upscaling solution.
AMD FidelityFX – Variable Shading
AMD FidelityFX Variable Shading drives Variable Rate Shading into your game.
AMD FidelityFX – Denoiser
AMD FidelityFX Denoiser is a set of denoising compute shaders which remove artefacts from reflection and shadow rendering.
AMD FidelityFX – Luminance Preserving Mapper
AMD FidelityFX LPM provides an open source library to easily integrate HDR and wide gamut tone and gamut mapping into your game.
AMD FidelityFX – Stochastic Screen Space Reflections
The AMD FidelityFX SSSR effect provides an open source library to easily integrate stochastic screen space reflections into your game.
AMD FidelityFX – Combined Adaptive Compute Ambient Occlusion
AMD FidelityFX Combined Adaptive Compute Ambient Occlusion (CACAO) is an RDNA-optimized implementation of ambient occlusion.
AMD FidelityFX – Single Pass Downsampler
FidelityFX Single Pass Downsampler (SPD) provides an RDNA-optimized solution for generating up to 12 MIP levels of a texture.
Radeon™ Cauldron is our open source experimentation framework for DirectX®12 and Vulkan®.
AMD FidelityFX – Contrast Adaptive Sharpening
AMD FidelityFX Contrast Adaptive Sharpening (CAS) provides a mixed ability to sharpen and optionally scale an image.
Our other effects
A multithreaded CPU library for deformable material physics, using the Finite Element Method (FEM)
The DepthOfFieldFX library provides a GCN-optimized Compute Shader implementation of Depth of Field using the Fast Filter Spreading approach.
GeometryFX improves the rasterizer efficiency by culling triangles that do not contribute to the output in a pre-pass. This allows the full chip to be used to process geometry, and ensures that the rasterizer only processes triangles that are visible.
ShadowFX library provides a scalable GCN-optimized solution for deferred shadow filtering. It supports uniform and contact hardening shadow (CHS) kernels.
The TressFX library is AMD’s hair/fur rendering and simulation technology. TressFX is designed to use the GPU to simulate and render high-quality, realistic hair and fur.