Using Radeon™ GPU Analyzer with DirectX®12 Graphics

A key difference between the new DirectX 12 mode (-s dx12) and the older DirectX 11 mode (-s dx11, previously named-s hlsl) is that the DirectX12 mode uses the live driver and follows the same compilation path as a real-world DirectX12 application. With that comes the power of generating disassembly and hardware resource usage statistics that are closest to the real-world case, and therefore making better performance optimization decisions.

To compile a DirectX12 graphics pipeline, you would need to provide the following inputs to the tool, in addition to the HLSL source files:

  • Root signature: The root signature can be either defined in the HLSL source code or provided in a pre-compiled binary file, as described in our previous article.
  • .gpso file: For compute pipelines, the HLSL source code, together with a valid root signature, are enough for performing a successful compilation of the pipeline. For graphics, however, a subset of the D3D12 graphics pipeline state is required as well. Without that additional data, RGA would not be able to properly set the pipeline state for your shaders and this would result in a compilation failure. The subset of the graphics pipeline state that RGA requires is defined in a custom .gpso file of the following format:
    
    
    # schemaVersion
    1.0
    
    # InputLayoutNumElements: Number of D3D12_INPUT_ELEMENT_DESC elements in the D3D12_INPUT_LAYOUT_DESC structure.  
    # Must match the following "InputLayout" section.
    2
    
    # InputLayout 
    # { SemanticName, SemanticIndex, Format, InputSlot, AlignedByteOffset, InputSlotClass, InstanceDataStepRate } 
    { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
    { "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }
    
    # PrimitiveTopologyType: The D3D12_PRIMITIVE_TOPOLOGY_TYPE value to be used when creating the PSO.
    D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE
    
    # NumRenderTargets: The number of formats in the upcoming RTVFormats section.
    1
    
    # RTVFormats: An array of DXGI_FORMAT-typed values for the render target formats.
    # The number of items in the array should match the above NumRenderTargets section.
    { DXGI_FORMAT_R8G8B8A8_UNORM }

    You can generate a template .gpso file and then edit it manually to match your pipeline by running:
    rga -s dx12 --gpso-template "full path to output file"

Example

In our following example we will use the D3D12HelloTriangle sample from Microsoft’s DirectX Graphics Samples. The pipeline has two very simple shaders, both defined in shaders.hlsl : VSMain is the vertex shader and PSMain is the pixel shader.

Let’s start by generating a template .gpso file:

rga -s dx12 --gpso-template C:\shaders\hellotriangle.gpso

Now, we will tweak the file’s contents to match our source code. Let’s have a look at D3D12HelloTriangle.cpp where we can find the input layout definition:


// Define the vertex input layout.
D3D12_INPUT_ELEMENT_DESC inputElementDescs[] =
{
    { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
    { "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }
};

Let’s copy the two input layout lines under the InputLayout section and adjust the InputLayoutNumElements value to 2.

Now, another quick look at the .cpp file shows that there is a single render target with a format of DXGI_FORMAT_R8G8B8A8_UNORM :


psoDesc.NumRenderTargets = 1;
psoDesc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM;

Let’s update the NumRenderTargets and RTVFormats sections accordingly, so we would end up with a .gpso file that looks like this:


# schemaVersion
1.0

# InputLayoutNumElements: Number of D3D12_INPUT_ELEMENT_DESC elements in the D3D12_INPUT_LAYOUT_DESC structure.  
# Must match the following "InputLayout" section.
2

# InputLayout 
# { SemanticName, SemanticIndex, Format, InputSlot, AlignedByteOffset, InputSlotClass, InstanceDataStepRate } 
{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
{ "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }

# PrimitiveTopologyType: The D3D12_PRIMITIVE_TOPOLOGY_TYPE value to be used when creating the PSO.
D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE

# NumRenderTargets: The number of formats in the upcoming RTVFormats section.
1

# RTVFormats: An array of DXGI_FORMAT-typed values for the render target formats.
# The number of items in the array should match the above NumRenderTargets section.
{ DXGI_FORMAT_R8G8B8A8_UNORM }

All we have to do now is run the RGA command line tool with the following command:


rga -s dx12 --vs C:\shaders\shaders.hlsl --ps C:\shaders\shaders.hlsl --vs-model "vs_6_0" --ps-model "ps_6_0" 
    --vs-entry VSMain --ps-entry PSMain --isa C:\output\isa.txt --rs-bin C:\RootSignatures\hellotriangle.rs.fxo 
    --gpso C:\shaders\hellotriangle.gpso

Where --rs-bin points to the pre-compiled root signature binary file. For more information about root signatures in RGA, see our previous article.

Since both the vertex and pixel shaders are defined in the same file and use the same shader model, we can use the --all-hlsl and --all-model options to make our command a bit less verbose:


rga -s dx12 --all-hlsl C:\shaders\shaders.hlsl --all-model "6_0" --vs-entry VSMain --ps-entry PSMain 
--isa C:\output\isa.txt --rs-bin C:\RootSignatures\hellotriangle.rs.fxo --gpso C:\shaders\hellotriangle.gpso

That’s it. After a successful build, we get the disassembly in the output folder:


; -------- Disassembly --------------------
shader main
  asic(GFX10)
  type(PS)
  sgpr_count(6)
  vgpr_count(8)
  wave_size(64)

  s_inst_prefetch  0x0003                               // 000000000000: BFA00003
  s_mov_b32     m0, s2                                  // 000000000004: BEFC0302
  v_interp_p1_f32  v2, v0, attr0.x                      // 000000000008: C8080000
  v_interp_p1_f32  v3, v0, attr0.y                      // 00000000000C: C80C0100
  v_interp_p1_f32  v4, v0, attr0.z                      // 000000000010: C8100200
  v_interp_p1_f32  v0, v0, attr0.w                      // 000000000014: C8000300
  v_interp_p2_f32  v2, v1, attr0.x                      // 000000000018: C8090001
  v_interp_p2_f32  v3, v1, attr0.y                      // 00000000001C: C80D0101
  v_interp_p2_f32  v4, v1, attr0.z                      // 000000000020: C8110201
  v_interp_p2_f32  v0, v1, attr0.w                      // 000000000024: C8010301
  v_cvt_pkrtz_f16_f32  v2, v2, v3                       // 000000000028: 5E040702
  v_cvt_pkrtz_f16_f32  v3, v4, v0                       // 00000000002C: 5E060104
  exp           mrt0, v2, v2, v3, v3 done compr vm      // 000000000030: F8001C0F 00000302
  s_endpgm                                              // 000000000038: BF810000
  s_code_end                                            // 00000000003C: BF9F0000

In addition to the --isa option that generates the disassembly, you can use the -a option that generates the hardware resource usage statistics for each shader in the pipeline, or the --livereg option that creates a live VGPR analysis report based on the generated disassembly.

For more information about the available options, run rga -s dx12 -h .

Acknowledgements

Code samples used herein are from Microsoft’s DirectX Graphics Samples and are © Microsoft 2015 and subject to the MIT License.

Resources

RGA

Radeon™ GPU Analyzer

Radeon GPU Analyzer is an offline compiler and performance analysis tool for DirectX®, Vulkan®, SPIR-V™, OpenGL® and OpenCL™.

AMD ISA Documentation

Instruction Set Architecture (ISA) documentation provides a guide for directly accessing the hardware.

Amit Ben-Moshe

Amit Ben-Moshe

Amit Ben-Moshe is a Technical Lead and a Principal Member of Technical Staff at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

Enjoy this blog post? If you found it useful, why not share it with other game developers?

You may also like...

Getting started: our software

New or fairly new to AMD’s tools, libraries, and effects? This is the best place to get started on GPUOpen!

Getting started: development and performance

Looking for tips on getting started with developing and/or optimizing your game, whether on AMD hardware or generally? We’ve got you covered!

If slide decks are what you’re after, you’ll find 100+ of our finest presentations here. Plus there’s a handy list of our product manuals!

Developer guides

Browse our developer guides, and find valuable advice on developing with AMD hardware, ray tracing, Vulkan, DirectX, UE4, and lots more.

Words not enough? How about pictures? How about moving pictures? We have some amazing videos to share with you!

The home of great performance and optimization advice for AMD RDNA™ 2 GPUs, AMD Ryzen™ CPUs, and so much more.

Product Blogs

Our handy product blogs will help you make good use of our tools, SDKs, and effects, as well as sharing the latest features with new releases.

Publications

Discover our published publications.