FidelityFX Parallel Sort
FidelityFX Parallel Sort GPU documentation.
Structs
Name |
Description |
---|---|
Constant buffer information needed for the execution of each pass in parallel sort.
|
Functions
Return type |
Description |
---|---|
void |
ffxParallelSortCalculateScratchResourceSize ( uint32_t maxNumKeys, uint32_t& scratchBufferSize, uint32_t& reduceScratchBufferSize )
Call to calculate the required size for the scratch and reduce scratch buffers used by parallel sort algorithm.
|
void |
ffxParallelSortSetConstantAndDispatchData ( uint32_t numKeys, uint32_t maxThreadGroups, FfxParallelSortConstants & constantBuffer, uint32_t& numThreadGroupsToRun, uint32_t& numReducedThreadGroupsToRun )
Call to setup the constant buffer data needed to bind to the GPU for Parallel Sort execution (all passes). Note that the implementor is left to manually modify the shift (bit shift for each pass) value.
|
Macros
Name |
Description |
---|---|
The number of elements dealt with per running thread.
|
|
The maximum number of thread groups to run in parallel. Modifying this value can help or hurt GPU occupancy, but is very hardware class specific.
|
|
FFX_PARALLELSORT_SORT_BIN_COUNT (1 << FFX_PARALLELSORT_SORT_BITS_PER_PASS) |
The number of bins used for the counting phase of the algorithm. Changing this value requires internal changes in LDS distribution and count, reduce, scan, and scatter passes.
|
The number of bits we are sorting per pass. Changing this value requires internal changes in LDS distribution and count, reduce, scan, and scatter passes.
|
|
The number of threads to execute in parallel for each dispatch group.
|
Detailed description
FidelityFX Parallel Sort GPU documentation.
Global functions
ffxParallelSortCalculateScratchResourceSize
Call to calculate the required size for the scratch and reduce scratch buffers used by parallel sort algorithm.
Parameters:
maxNumKeys |
The maximum number of keys the algorithm will be asked to sort through. |
scratchBufferSize |
The size of the scratch buffer that needs to be allocated. |
reduceScratchBufferSize |
The size of the reduce scratch buffer that needs to be allocated. |
ffxParallelSortSetConstantAndDispatchData
void ffxParallelSortSetConstantAndDispatchData (
uint32_t numKeys,
uint32_t maxThreadGroups,
FfxParallelSortConstants & constantBuffer,
uint32_t& numThreadGroupsToRun,
uint32_t& numReducedThreadGroupsToRun
)
Call to setup the constant buffer data needed to bind to the GPU for Parallel Sort execution (all passes). Note that the implementor is left to manually modify the shift (bit shift for each pass) value.
Parameters:
numKeys |
The number of keys the algorithm will be sorting through. |
maxThreadGroups |
The maximum number of thread groups to use in parallel. |
constantBuffer |
The |
numThreadGroupsToRun |
The number of thread groups (dispatch size) to run for this sort run. |
numReducedThreadGroupsToRun |
The number of reduce thread groups (dispatch size) to run for this sort run. |
Macros
FFX_PARALLELSORT_ELEMENTS_PER_THREAD
The number of elements dealt with per running thread.
FFX_PARALLELSORT_MAX_THREADGROUPS_TO_RUN
The maximum number of thread groups to run in parallel. Modifying this value can help or hurt GPU occupancy, but is very hardware class specific.
FFX_PARALLELSORT_SORT_BIN_COUNT
The number of bins used for the counting phase of the algorithm. Changing this value requires internal changes in LDS distribution and count, reduce, scan, and scatter passes.
FFX_PARALLELSORT_SORT_BITS_PER_PASS
The number of bits we are sorting per pass. Changing this value requires internal changes in LDS distribution and count, reduce, scan, and scatter passes.
FFX_PARALLELSORT_THREADGROUP_SIZE
The number of threads to execute in parallel for each dispatch group.