FidelityFX Parallel Sort

FidelityFX Parallel Sort GPU documentation.

Structs

Name

Description

FfxParallelSortConstants

Constant buffer information needed for the execution of each pass in parallel sort.

Functions

Return type

Description

void

ffxParallelSortCalculateScratchResourceSize ( uint32_t maxNumKeys, uint32_t& scratchBufferSize, uint32_t& reduceScratchBufferSize )
Call to calculate the required size for the scratch and reduce scratch buffers used by parallel sort algorithm.

void

ffxParallelSortSetConstantAndDispatchData ( uint32_t numKeys, uint32_t maxThreadGroups, FfxParallelSortConstants & constantBuffer, uint32_t& numThreadGroupsToRun, uint32_t& numReducedThreadGroupsToRun )
Call to setup the constant buffer data needed to bind to the GPU for Parallel Sort execution (all passes). Note that the implementor is left to manually modify the shift (bit shift for each pass) value.

Macros

Name

Description

FFX_PARALLELSORT_ELEMENTS_PER_THREAD 4

The number of elements dealt with per running thread.

FFX_PARALLELSORT_MAX_THREADGROUPS_TO_RUN 800

The maximum number of thread groups to run in parallel. Modifying this value can help or hurt GPU occupancy, but is very hardware class specific.

FFX_PARALLELSORT_SORT_BIN_COUNT (1 << FFX_PARALLELSORT_SORT_BITS_PER_PASS)

The number of bins used for the counting phase of the algorithm. Changing this value requires internal changes in LDS distribution and count, reduce, scan, and scatter passes.

FFX_PARALLELSORT_SORT_BITS_PER_PASS 4

The number of bits we are sorting per pass. Changing this value requires internal changes in LDS distribution and count, reduce, scan, and scatter passes.

FFX_PARALLELSORT_THREADGROUP_SIZE 128

The number of threads to execute in parallel for each dispatch group.

Detailed description

FidelityFX Parallel Sort GPU documentation.

Global functions

ffxParallelSortCalculateScratchResourceSize

Copied!

void ffxParallelSortCalculateScratchResourceSize (
    uint32_t maxNumKeys,
    uint32_t& scratchBufferSize,
    uint32_t& reduceScratchBufferSize
)

Call to calculate the required size for the scratch and reduce scratch buffers used by parallel sort algorithm.

Parameters:

maxNumKeys

The maximum number of keys the algorithm will be asked to sort through.

scratchBufferSize

The size of the scratch buffer that needs to be allocated.

reduceScratchBufferSize

The size of the reduce scratch buffer that needs to be allocated.


ffxParallelSortSetConstantAndDispatchData

Copied!

void ffxParallelSortSetConstantAndDispatchData (
    uint32_t numKeys,
    uint32_t maxThreadGroups,
    FfxParallelSortConstants & constantBuffer,
    uint32_t& numThreadGroupsToRun,
    uint32_t& numReducedThreadGroupsToRun
)

Call to setup the constant buffer data needed to bind to the GPU for Parallel Sort execution (all passes). Note that the implementor is left to manually modify the shift (bit shift for each pass) value.

Parameters:

numKeys

The number of keys the algorithm will be sorting through.

maxThreadGroups

The maximum number of thread groups to use in parallel.

constantBuffer

The FfxParallelSortConstants buffer to fill with information.

numThreadGroupsToRun

The number of thread groups (dispatch size) to run for this sort run.

numReducedThreadGroupsToRun

The number of reduce thread groups (dispatch size) to run for this sort run.


Macros

FFX_PARALLELSORT_ELEMENTS_PER_THREAD

Copied!

#define FFX_PARALLELSORT_ELEMENTS_PER_THREAD 4

The number of elements dealt with per running thread.


FFX_PARALLELSORT_MAX_THREADGROUPS_TO_RUN

Copied!

#define FFX_PARALLELSORT_MAX_THREADGROUPS_TO_RUN 800

The maximum number of thread groups to run in parallel. Modifying this value can help or hurt GPU occupancy, but is very hardware class specific.


FFX_PARALLELSORT_SORT_BIN_COUNT

Copied!

#define FFX_PARALLELSORT_SORT_BIN_COUNT (1 << FFX_PARALLELSORT_SORT_BITS_PER_PASS)

The number of bins used for the counting phase of the algorithm. Changing this value requires internal changes in LDS distribution and count, reduce, scan, and scatter passes.


FFX_PARALLELSORT_SORT_BITS_PER_PASS

Copied!

#define FFX_PARALLELSORT_SORT_BITS_PER_PASS 4

The number of bits we are sorting per pass. Changing this value requires internal changes in LDS distribution and count, reduce, scan, and scatter passes.


FFX_PARALLELSORT_THREADGROUP_SIZE

Copied!

#define FFX_PARALLELSORT_THREADGROUP_SIZE 128

The number of threads to execute in parallel for each dispatch group.


Related pages

  • Visit the FidelityFX SDK product page for download links and more information.

Looking for more documentation on GPUOpen?

AMD GPUOpen software blogs

Our handy software release blogs will help you make good use of our tools, SDKs, and effects, as well as sharing the latest features with new releases.

GPUOpen Manuals

Don’t miss our manual documentation! And if slide decks are what you’re after, you’ll find 100+ of our finest presentations here.

AMD GPUOpen Performance Guides

The home of great performance and optimization advice for AMD RDNAâ„¢ 2 GPUs, AMD Ryzenâ„¢ CPUs, and so much more.

Getting started: AMD GPUOpen software

New or fairly new to AMD’s tools, libraries, and effects? This is the best place to get started on GPUOpen!

AMD GPUOpen Getting Started Development and Performance

Looking for tips on getting started with developing and/or optimizing your game, whether on AMD hardware or generally? We’ve got you covered!

AMD GPUOpen Technical blogs

Browse our technical blogs, and find valuable advice on developing with AMD hardware, ray tracing, Vulkan®, DirectX®, Unreal Engine, and lots more.

Find out more about our software!

AMD GPUOpen Effects - AMD FidelityFX technologies

Create wonder. No black boxes. Meet the AMD FidelityFX SDK!

AMD GPUOpen Samples

Browse all our useful samples. Perfect for when you’re needing to get started, want to integrate one of our libraries, and much more.

AMD GPUOpen developer SDKs

Discover what our SDK technologies can offer you. Query hardware or software, manage memory, create rendering applications or machine learning, and much more!

AMD GPUOpen Developer Tools

Analyze, Optimize, Profile, Benchmark. We provide you with the developer tools you need to make sure your game is the best it can be!