| GCN Shader Extensions for Direct3D and Vulkan

The GCN architecture contains a lot of functionality in the shader cores which is not currently exposed in current APIs like Vulkan™ or Direct3D® 12. One of the mandates of GPUOpen is to give developers better access to the hardware, and this post details extensions for these APIs to expose additional GCN features to developers.

Shader Extensions

With those shader extensions, we provide access to wavefront-wide functions, which is an important building block to exploit the SIMD execution model of GPUs. For instance, the use of mbcnt and ballot can replace atomics in various cases, drastically boosting performance. The wavefront-wide instructions also include swizzles, which allow individual lanes to exchange data without going through memory.

Additionally, we expose readfirstlane and other functions which enable the compiler to move data from VGPRs into SGPRs. Especially for VGPR heavy code, marking variables as wavefront-uniform can reduce the VGPR count significantly.

Another often-requested feature which is getting exposed today is direct access to the barycentric coordinates. This is again an important building block for various algorithms.

Finally, we also provide various utility functions. In this release, we’re providing the 3-parameter min, max and med functions which map directly to the corresponding GCN opcodes.

Direct3D Access

In Direct3D, the shader extensions are exposed through the AMD GPU Services (AGS) library. For more information on AGS, visit GPUOpen’s AGS page.

The extension allows you to query the presence of the various extension functions. The functions have to be enabled before you can load a shader that uses them – this guarantees that your shader is compatible with the underlying hardware.

In your code, all you need to do is include amd_ags.h in your c/cpp file, initialize AGS with agsInit, and then check for shader extension support as shown in the code below:

    
#include "amd_ags.h"
// call agsInit prior to D3D12 device creation
[...]
unsigned int extensionsSupported = 0;
if (agsDriverExtensionsDX12_Init(m_agsContext, m_device.Get(), &extensionsSupported)== AGS_SUCCESS)
{
    if (extensionsSupported & AGS_DX12_EXTENSION_INTRINSIC_BARYCENTRICS)
    {
        // the Barycentric extension is supported so use this codepath
    }
}

In your HLSL code you need to add the following line at the top of your file:

#include "ags_shader_intrinsics_dx12.hlsl"

And then you can call the new function this way:

    
float2 barycentric = AmdExtD3DShaderIntrinsics_IjBarycentricCoords(AmdExtD3DShaderIntrinsicsBarycentric_LinearCenter);
return float3(barycentric.x, barycentric.y, 1.0 - (barycentric.x + barycentric.y));

Vulkan Access

In Vulkan, the shader extensions are grouped into three separate extensions:

  • AMD_shader_ballot
  • AMD_shader_trinary_minmax
  • AMD_shader_explicit_vertex_parameter

These are device extensions which you need to check before you can use them. Similar to Direct3D, we’re grouping the functions, so there’s no need to check each one individually. Keep in mind that the extensions are SPIR-V – Vulkan does not consume GLSL directly. In order to use them from GLSL, we’re also releasing an update to the glslangValidator with support for those extensions.

Extension Overview

Here’s an overview of all extension that we expose, and the corresponding names for HLSL and GLSL.

HLSLGLSLDescription
ReadfirstlanereadFirstInvocationARBRead a value from the first active lane
ReadlanereadInvocationARBRead a value from any line
N/AwriteInvocationAMDWrite a value to a different lane
LaneIdgl_SubGroupInvocationARBGet the current lane id
N/AswizzleInvocationsAMDExchange data across lanes
SwizzleswizzleInvocationsMaskedAMDExchange data across lanes
BallotballotARBRead the execution mask
MBCntmbcntAMDCount number of active lanes with an index less than the current lane
Min3F/Min3Umin33-parameter min
Med3F/Med3Umed33-parameter median
Max3F/Max3Umax33-parameter max
BarycentricCoords(LinearCenter)gl_BaryCoordNoPerspAMDRead barycentrics with linear interpolation
BarycentricCoords(LinearCentroid)gl_BaryCoordNoPerspCentroidAMDRead barycentrics with linear interpolation
BarycentricCoords(LinearSample)gl_BaryCoordNoPerspSampleAMDRead barycentrics with linear interpolation
BarycentricCoords(PerspCenter)gl_BaryCoordSmoothAMDRead barycentrics with perspective interpolation
BarycentricCoords(PerspCentroid)gl_BaryCoordSmoothCentroidAMDRead barycentrics with perspective interpolation
BarycentricCoords(PerspSample)gl_BaryCoordSmoothSampleAMDRead barycentrics with perspective interpolation
BarycentricCoords(PerspPullModel)gl_BaryCoordPullModelAMDSelect the pull model for barycentrics
VertexParameterinterpolateAtVertexAMDInterpolate a vertex parameter

Samples

The following GPUOpen samples demonstrate how to access the extensions:

Vulkan mbcnt Sample

This sample shows how to use the AMD_shader_ballot extension and mbcnt to perform a fast reduction within a wavefront.

| RESOURCES

AMD GPU Services (AGS) Library

The AMD GPU Services (AGS) library provides software developers with the ability to query AMD GPU software and hardware state information that is not normally available through standard operating systems or graphics APIs.

DirectX®12

Microsoft® DirectX®12 provides APIs for creating games and other graphics applications. Find out how we can help you get the best out of DirectX®12.

Vulkan®

Vulkan® gives software developers control over the performance, efficiency, and capabilities of AMD Radeon™ GPUs and multi-core CPUs.

| OTHER POSTS BY MATTHAEUS CHAJDAS

Fast compaction with mbcnt

With shader extensions, we provide access to a much better tool to get compaction done: GCN provides a special op-code for compaction within a wavefront.

Vulkan barriers explained

Vulkan barriers are unique as they requires you to provide what resources are transitioning and also specify a source and destination pipeline stage.

Optimizing Terrain Shadows

One thing which is often forgotten is shadow map rendering. As the tessellation level of the terrain is not optimized for the shadow camera, but for the primary camera, this often results in a very strong mismatch and shadow maps end up getting extremely over-tessellated.

Matthaeus Chajdas
Matthäus Chajdas is a developer technology engineer at AMD. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

| YOU MAY ALSO LIKE...

Tutorials Library

Browse all our fantastic tutorials, including programming techniques, performance improvements, guest blogs, and how to use our tools.

Samples Library

Browse all our useful samples. Perfect for when you’re needing to get started, want to integrate one of our libraries, and much more.