Introduction

Radeon™ GPU Analyzer (RGA) 2.2 introduces support for DirectX®12 compute shaders in a new mode (-s dx12) of the command line tool. You can use that new mode to generate GCN/RDNA ISA disassembly for your compute shaders, regardless of the physically installed GPU. For example, you can compile your shaders for AMD Radeon™ RX 5700 target (gfx1010) of the Navi generation, even if your machine does not have a Navi-class card installed. In addition, you can use the new mode to obtain hardware resource usage statistics for your compute shaders, such as VGPR/SGPR consumption and LDS usage, and generate live register analysis reports or control flow graphs.

A big advantage of the new mode over the previous HLSL mode (-s hlsl, which has been renamed –s dx11 in RGA 2.2) is that it uses the live driver to leverage the same compilation path that your shaders go through in the real-life case. This provides you with the closest results to what would happen in real-life, so that you can make better optimization decisions.

Usage

Prerequisites

The new mode requires the latest AMD Adrenalin driver.

How it works

Before getting to usage examples, let’s first understand how RGA for Direct3D® 12 works. There are two types of use cases: HLSL as an input and DXIL/DXBC blob as an input. In this mode, in order to provide you with the closest results to the real-life case, RGA does not compile a standalone compute shader, but rather a compute pipeline. Therefore, you have to pass a valid root signature along with your shader for the compilation to succeed. Let’s examine the two use cases:

  1. HLSL as input
    The following diagram shows the compilation process of an HLSL file in the DX12 mode:

    If the shader model is 5.1 or above, RGA passes the HLSL code through DXC as the front-end compiler and uses the generated DXIL binary in the created pipeline. If the shader model is 5.0 or below, RGA uses the runtime (D3DCompiler*.dll) to compile HLSL to DXBC and uses the generated DXBC in the pipeline. When using HLSL source code as your input, you have 3 different ways in which you can provide your root signature to RGA:

    1. [RootSignature] attribute for your shader in the .hlsl file
      Shader model 5.1 allows you to define the root signature in a macro definition in the HLSL source code, and reference it from a shader using the [RootSignature] attribute. Here is an example: 
      #define _RootSig \
      "RootFlags(0), " \
      "CBV(b0), " \
      "DescriptorTable(SRV(t0, numDescriptors = 2))," \
      "DescriptorTable(UAV(u0, numDescriptors = 2))"
      
      [RootSignature(_RootSig)]
      void main(...)

      If you use this approach, RGA can automatically detect the root signature and compile it with the shader.

    2. Macro definition in the same .hlsl file or in an included file (use: –rs-macro)
      Another option is to define a root signature in a macro inside the .hlsl file where the shader is defined (or in a separate, included, .hlsl file), without referencing it using the [RootSignature] attribute. For example, suppose that we had the same code as in the first bullet above, but without the [RootSignature(_RootSig)] attribute: 
      #define _RootSig \
      "RootFlags(0), " \
      "CBV(b0), " \
      "DescriptorTable(SRV(t0, numDescriptors = 2))," \
      "DescriptorTable(UAV(u0, numDescriptors = 2))"
      
      void main(...)

      In this scenario, you can pass the tool the name of the macro definition through the –rs-macro command line option. RGA will then perform a two-pass front-end compilation: first to compile the root signature, and then to compile the shader.

      Note:

      • If _RootSig is defined in a separate .hlsl file (which is not included from your main .hlsl file), you would have to use –rs-macro-hlsl with the path to the HLSL file where the macro is defined as an argument, in addition to –rs-macro.
      • By default RGA assumes version 1.1 (“rootsig_1_1”) as the root signature version. To force a different root signature version for your macro, use the –rs-macro-version option.
    3. Pre-compiled binary root signature (use: –rs-bin )
      If you have your root signature in a pre-compiled binary file, use the –rs-bin command line switch with the full path to your binary as the argument. Note that you can extract a root signature from a compiled blob at runtime using the D3DGetBlobPart() function (pass in D3D_BLOB_ROOT_SIGNATURE for blob part), or use FXC to extract the root signature from a compiled DXBC blob.
  1. DXIL or DXBC blob as input
    If the root signature had been pre-compiled into the blob, no additional argument is required – the root signature from the blob would be used by RGA. Otherwise, you should provide a compiled root signature using –rs-bin.

Examples

In our following examples we will use the D3D12RaytracingMiniEngineSample sample from Microsoft’s DirectX Graphics Samples. Let’s assume that we are trying to compile FillLightGridCS_8.hlsl, and, for simplicity, that the .hlsl file and the rest of this sample’s shaders are located under C:\sample on our system.
The following command would compile the shader and generate the disassembly for the latest supported target GPU:

rga.exe -s dx12 --cs C:\sample\FillLightGridCS_8.hlsl --cs-entry main --cs-model cs_5_1 --isa C:\sample\disassembly.txt

Let’s break down the command:

  • –cs points to the compute shader that we try to compile
  • –cs-entry provides the name of the entry point (the compute shader)
  • –cs-model provides the shader model (also known as the “target” in D3D terminology)
  • –isa points to the full path of the output text file where the disassembly would be stored
  • Since we did not use the –asic or -c options to set the target, it would be set by default to the latest supported target by the installed driver.

Note that we did not explicitly provide a root signature, yet it compiles. This is because this sample contains a root signature in the HLSL code, and also contains the [RootSignature()] attribute for the shader:

#define _RootSig \
    "RootFlags(0), " \
    "CBV(b0), " \
    "DescriptorTable(SRV(t0, numDescriptors = 2))," \
    "DescriptorTable(UAV(u0, numDescriptors = 2))"

[RootSignature(_RootSig)]
[numthreads(WORK_GROUP_SIZE_X, WORK_GROUP_SIZE_Y, WORK_GROUP_SIZE_Z)]
void main(…)

Therefore, this falls under section 1.1 in the Usage section above, and the compilation succeeds with the following build log:

Performing front-end compilation through DXC... front-end compilation success.
Compiling compute pipeline...
Extracting compute shader disassembly...
Compute shader disassembly extracted successfully.
Succeeded.

Note that we are using Shader model 5.1, therefore DXC is used to perform the front-end compilation, as you can see in the build log. If we had changed the argument for –cs-model from cs_5_1 to cs_5_0, you would observe that DXC is no longer used, but rather the runtime compiler (as you can see in the diagram that appears in the beginning of this article).

Now, let’s tweak the sample and comment out the [RootSignature(_RootSig)] attribute:

//[RootSignature(_RootSig)]
[numthreads(WORK_GROUP_SIZE_X, WORK_GROUP_SIZE_Y, WORK_GROUP_SIZE_Z)]
void main(...)

This simulates a scenario where the root signature is defined in the HLSL source code (or in an included HLSL file) but not referenced using the [RootSignature()] attribute. In that case, trying to compile the tweaked shader with the same command would fail, since the compiler is not instructed to use the _RootSig macro as the root signature for our shader. To address that, we would add the –rs-macro switch to our command with _RootSig as its argument:

rga.exe -s dx12 --cs C:\sample\FillLightGridCS_8.hlsl --cs-entry main --cs-model cs_5_1 --rs-macro _RootSig --isa C:\sample\disassembly.txt

Now, we fall under section 1.2 above, RGA performs a two-pass compilation where it first compiles the root signature and then compiles the compute pipeline, and the compilation succeeds.

Finally, let’s assume that the _RootSig macro is missing from the file as well, and that we have the root signature pre-compiled in a binary file named RootSig.rs.fxo (this would be the most common case in production since pre-compiling the root signature is the best practice from a performance perspective). To compile the shader, we would tweak the command to include the –rs-bin switch with the full path to RootSig.rs.fxo as the argument:

rga.exe -s dx12 --cs C:\sample\FillLightGridCS_8.hlsl --cs-entry main --cs-model cs_5_1 --rs-bin C:\Samples\RootSig.rs.fxo --isa C:\sample\disassembly.txt

Now, we fall under case 1.3 above, the root signature is taken from the pre-compiled file, and the compilation succeeds.

Acknowledgements

Code samples used herein are from Microsoft®’s DirectX Graphics Samples and are © Microsoft 2015 and subject to the MIT License.

Resources

Radeon™ GPU Analyzer

Radeon GPU Analyzer is an offline compiler and performance analysis tool for DirectX®, Vulkan®, SPIR-V™, OpenGL® and OpenCL™.

AMD GPUOpen ISA guides

AMD ISA Documentation

Instruction Set Architecture (ISA) documentation provides a guide for directly accessing the hardware.