AMD FidelityFX Parallel Sort provides an open source header implementation to easily integrate a highly optimized compute-based radix sort into your game.

Supports:

  • DirectX®12
  • Vulkan®

Download the latest version v1.1:

This release adds the following features:

  • Vulkan® implementation.
  • Upgraded framework to Cauldron v1.4.
  • General code cleanup and readability improvements.

Features

State-of-the-art algorithm

Optimized for Shader Model 6.0+

Open source, MIT license

Additional features:

  • Direct and Indirect execution support.
  • RDNA™+ optimized algorithm.
  • 32-bit key and payload sort support
  • Support for DirectX®12 API and Vulkan®.
  • Shaders written in HLSL utilizing Shader Model 6.0 wave-level operations.

A sample application is provided for DirectX®12 and Vulkan®.

How it works

Algorithm Overview

AMD FidelityFX Parallel Sort is an RDNA™-optimized version of the Radix Sort algorithm.

At a high level, the algorithm works by recursing over a data set to be sorted (key or key/value pairs), and re-arranging it in place by 4-bit increments. Each pass guarantees that the data set is fully sorted up to the number of bits processed. For example, after 4 iterations, we are guaranteed that the first 16 bits of the key is properly sorted.

For each iteration that is executed, 5 actions are taken on the data set:

  1. The 4-bit value range we are currently sorting is summed up into buckets from 0-15, so that we know how many of each value occurs throughout the data set.
  2. The number of occurrences go through a reduction phase in order to pre-increment offsets on a thread group basis later on.
  3. The reduced occurrences go through a scan-prefix to calculate offset values for each value group (0-15) on a thread group basis.
  4. The full occurrences buffer then also goes through a scan-prefix, and adds the reduced scan-prefix values to properly index the data across all thread groups.
  5. The data set is read in one more time, and written to its new sorted offset location. If there is also a payload, it is also copied over at this time.

Once all iterations have run (in the case of 32-bit keys, it runs 8 times), the entire data set is sorted.

Additional resources

A set of guidelines for developers on how to present options in the game’s user interface to enable/disable AMD FidelityFX Effects.

Find out what developers are saying about AMD FidelityFX.

Comparison: GPU particle sorting

Unsorted GPU particles Sorted GPU particles

Comparison: image index buffer sorting

Unsorted image index buffer Sorted image index buffer

Version history

  • Initial release

More AMD FidelityFX effects

Shadow Denoiser

AMD FidelityFX – Denoiser

AMD FidelityFX Denoiser is a set of denoising compute shaders which remove artefacts from reflection and shadow rendering.

Cauldron Framework

Radeon™ Cauldron is our open source experimentation framework for DirectX®12 and Vulkan®.

Our other effects

FEMFX

A multithreaded CPU library for deformable material physics, using the Finite Element Method (FEM)

DepthOfFieldFX

The DepthOfFieldFX library provides a GCN-optimized Compute Shader implementation of Depth of Field using the Fast Filter Spreading approach.

GeometryFX

GeometryFX improves the rasterizer efficiency by culling triangles that do not contribute to the output in a pre-pass. This allows the full chip to be used to process geometry, and ensures that the rasterizer only processes triangles that are visible.

ShadowFX

ShadowFX library provides a scalable GCN-optimized solution for deferred shadow filtering. It supports uniform and contact hardening shadow (CHS) kernels.

TressFX

The TressFX library is AMD’s hair/fur rendering and simulation technology. TressFX is designed to use the GPU to simulate and render high-quality, realistic hair and fur.