FidelityFX Blur 1.1

FidelityFX Blur is a compute-based, highly-optimized, single-pass Gaussian blur technique.

Requirements

  • HLSL CS_6_0

  • GLSL version 450

Table of contents

Integration guidelines

C++ SDK integration

The easiest way to integrate the FidelityFX Blur effect is by using the FFX SDK C++ API.

Example C++:

Copied!

// Initialize the FFX backend and blur context (do this once)
size_t scratchBufferSize = ffxGetScratchMemorySize (FFX_BLUR_CONTEXT_COUNT );
void* scratchBuffer = malloc(scratchBufferSize);
FfxInterface  backendInterface;
ffxGetInterface (&backendInterface, GetDevice (), scratchBuffer, scratchBufferSize, FFX_BLUR_CONTEXT_COUNT );

FfxBlurContextDescription  desc = {};
desc.backendInterface  = backendInterface;
desc.floatPrecision    = FFX_BLUR_FLOAT_PRECISION_32BIT ;
desc.kernelPermutations  = FFX_BLUR_KERNEL_PERMUTATIONS_ALL ; // mask to support some Guassian sigma kernels or ALL
desc.kernelSizes  = FFX_BLUR_KERNEL_SIZE_ALL ; // mask to support some kernel sizes or ALL

FfxBlurContext  blurContext;
ffxBlurContextCreate (&blurContext, &desc);

// Execute blur effect (multiple times)
FfxBlurDispatchDescription  desc = {};

desc.commandList = ffxGetCommandList (pCmdList);

desc.kernelPermutation = FFX_BLUR_KERNEL_PERMUTATION_2; // Guassian sigma (1.6, 2.8, or 4.0)
desc.kernelSize = FFX_BLUR_KERNEL_SIZE_15x15 ; // Kernel sizes (3x3 ... 21x21)

desc.input = ffxGetResource (inputResource, L"BLUR_InputSrc", FFX_RESOURCE_STATE_PIXEL_COMPUTE_READ );

desc.inputAndOutputSize.width  = desc.input.description.width;
desc.inputAndOutputSize.height = desc.input.description.height;

desc.output = ffxGetResource (outputResource, L"BLUR_Output", FFX_RESOURCE_STATE_UNORDERED_ACCESS );

ffxBlurContextDispatch (&blurContext, &desc);

// When done using the blur effect, clean up.
ffxBlurContextDestroy (&blurContext);

Customization via callbacks and main functions

Blur provides default implementations for blur kernel weights and I/O functions in ffx_blur_callbacks_hlsl.h and ffx_blur_callbacks_glsl.h. SDK users can override them to integrate with their own application.

Blur expects the 1D kernel weights to be embedded in the compiled shader for the maximum performance, with the following requirements:

  • FfxFloat32 or FfxFloat16 type for the kernel weight in the function signatures depending on half precision support. These types are defined in the ffx_core.h FFX SDK header file.

  • A unique set of kernel weights to be defined as a static array, representing the normalized Gaussian weight distribution with elements sorted in decreasing order

Example Blur kernel override:

Copied!

#if FFX_HALF
    #define FFX_BLUR_KERNEL_TYPE FfxFloat16
#else
    #define FFX_BLUR_KERNEL_TYPE FfxFloat32
#endif

inline FFX_BLUR_KERNEL_TYPE GetKernelWeight(int iKernelIndex)
{
    static FFX_BLUR_KERNEL_TYPE kernel_weights[] = { 0.257030201088974, 0.22378581991669, 0.147699079538823 }; // for a 5x5 Gaussian kernel
    return kernel_weights[iKernelIndex];
}
FFX_BLUR_KERNEL_TYPE FfxBlurLoadKernelWeight(FfxInt32 iKernelIndex) { return GetKernelWeight(iKernelIndex); }

Example Blur I/O function override:

Copied!

#if FFX_HALF
  FfxFloat16x3 FfxBlurLoadInput(FfxInt16x2 inPxCoord) { return texColorInput[inPxCoord].rgb; }
  void FfxBlurStoreOutput(FfxInt32x2 outPxCoord, FfxFloat16x3 color) { texColorOutput[outPxCoord] = min16float4(color, 1); }
#else
  FfxFloat32x3 FfxBlurLoadInput(FfxInt32x2 inPxCoord) { return texColorInput[inPxCoord].rgb; }
  void FfxBlurStoreOutput(FfxInt32x2 outPxCoord, FfxFloat32x3 color) { texColorOutput[outPxCoord] = float4(color, 1); }
#endif

It is also possible to integrate the Blur effect via your own main function.

  • The FFX_BLUR_TILE_SIZE_X and FFX_BLUR_TILE_SIZE_Y macros must be defined. Generally these two are set to 8.

Example HLSL:

Copied!

#include "blur/ffx_blur_callbacks_hlsl.h"
#include "blur/ffx_blur_blur.h"

[numthreads(FFX_BLUR_TILE_SIZE_X, FFX_BLUR_TILE_SIZE_Y, 1)]
void CS(
    uint3 LocalThreadId    : SV_GroupThreadID,
    uint3 WorkGroupId      : SV_GroupID,
    uint3 DispatchThreadID : SV_DispatchThreadID)
{
    // Run FidelityFX - Blur
    ffxBlurPass(int2(DispatchThreadID.xy), int2(LocalThreadId.xy), int2(WorkGroupId.xy));
}

Example GLSL:

Copied!

#include "blur/ffx_blur_callbacks_glsl.h"
#include "blur/ffx_blur_blur.h"

layout (local_size_x = FFX_BLUR_TILE_SIZE_X, local_size_y = FFX_BLUR_TILE_SIZE_Y, local_size_z = 1) in;
void main()
{
    // Run FidelityFX Blur
    ffxBlurPass(
        FfxInt32x2(gl_GlobalInvocationID.xy),
        FfxInt32x2(gl_LocalInvocationID.xy),
        FfxInt32x2(gl_WorkGroupID.xy));
}

Under the hood: Gaussian blur

Blur pass

Algorithm

Blur employs a 3-step algorithm implemented in a single compute dispatch by utilizing thread group shared memory to store the intermediate blur results.

The three steps are as follows:

  • Pre-fill step

    • Fill the cache with horizontal blur from the image source

  • Loop top-down

    • Fill the cache with horizontal blur from the image source

    • Do vertical blur from cached results

    • Write out to the destination resource

  • Loop over the last few tiles

    • Vertical blur from cache

    • Write out to the destination resource

Dispatch time complexity

Wave execution time complexity

Group Shared Memory space complexity

O(W * H * K)

O(H * K)

O(Th * (Tw + K))

where:

  • W : Image Width

  • H : Image Height

  • K : 1D Kernel Size

  • Tw : Blur Tile Size X

  • Th : Blur Tile Size Y

Work distribution

Blur works with 8×8 thread groups, covering a ‘tile’ in the image.

The work distribution is influenced by the following #define s

Work distribution visualized

Shared memory

 

 

Blur uses a region of shared memory as a ring-buffer of tiles to store the intermediate results. , The thread group, corresponding to a single tile, is sampling the source image and running a horizontal blur before storing the results in the shared memory. , The thread group later on reads from the shared memory and runs a vertical blur on the intermediate results before storing the fully blurred image in the destination texture.

Work distribution visualized

Further reading

  • GDC 2019: A Blend of GCN Optimization & Color Processing, Part I: Store Caching in Separable Filters PDF View Link

  • GDC 2018: Engine Optimization Hot Lap PPTX Download Link

  • Rastergrid: Efficient Blur w/ Linear Sampling Webpage Link

See also

Related pages

  • Visit the FidelityFX SDK product page for download links and more information.

Looking for more documentation on GPUOpen?

AMD GPUOpen software blogs

Our handy software release blogs will help you make good use of our tools, SDKs, and effects, as well as sharing the latest features with new releases.

GPUOpen Manuals

Don’t miss our manual documentation! And if slide decks are what you’re after, you’ll find 100+ of our finest presentations here.

AMD GPUOpen Performance Guides

The home of great performance and optimization advice for AMD RDNAâ„¢ 2 GPUs, AMD Ryzenâ„¢ CPUs, and so much more.

Getting started: AMD GPUOpen software

New or fairly new to AMD’s tools, libraries, and effects? This is the best place to get started on GPUOpen!

AMD GPUOpen Getting Started Development and Performance

Looking for tips on getting started with developing and/or optimizing your game, whether on AMD hardware or generally? We’ve got you covered!

AMD GPUOpen Technical blogs

Browse our technical blogs, and find valuable advice on developing with AMD hardware, ray tracing, Vulkan®, DirectX®, Unreal Engine, and lots more.

Find out more about our software!

AMD GPUOpen Effects - AMD FidelityFX technologies

Create wonder. No black boxes. Meet the AMD FidelityFX SDK!

AMD GPUOpen Samples

Browse all our useful samples. Perfect for when you’re needing to get started, want to integrate one of our libraries, and much more.

AMD GPUOpen developer SDKs

Discover what our SDK technologies can offer you. Query hardware or software, manage memory, create rendering applications or machine learning, and much more!

AMD GPUOpen Developer Tools

Analyze, Optimize, Profile, Benchmark. We provide you with the developer tools you need to make sure your game is the best it can be!