A key difference between the new DirectX 12 mode (-s dx12) and the older DirectX 11 mode (-s dx11, previously named-s hlsl) is that the DirectX12 mode uses the live driver and follows the same compilation path as a real-world DirectX12 application. With that comes the power of generating disassembly and hardware resource usage statistics that are closest to the real-world case, and therefore making better performance optimization decisions.

To compile a DirectX12 graphics pipeline, you would need to provide the following inputs to the tool, in addition to the HLSL source files:

  • Root signature: The root signature can be either defined in the HLSL source code or provided in a pre-compiled binary file, as described in our previous article.
  • .gpso file: For compute pipelines, the HLSL source code, together with a valid root signature, are enough for performing a successful compilation of the pipeline. For graphics, however, a subset of the D3D12 graphics pipeline state is required as well. Without that additional data, RGA would not be able to properly set the pipeline state for your shaders and this would result in a compilation failure. The subset of the graphics pipeline state that RGA requires is defined in a custom .gpso file of the following format:
    
    
    # schemaVersion
    1.0
    
    # InputLayoutNumElements: Number of D3D12_INPUT_ELEMENT_DESC elements in the D3D12_INPUT_LAYOUT_DESC structure.  
    # Must match the following "InputLayout" section.
    2
    
    # InputLayout 
    # { SemanticName, SemanticIndex, Format, InputSlot, AlignedByteOffset, InputSlotClass, InstanceDataStepRate } 
    { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
    { "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }
    
    # PrimitiveTopologyType: The D3D12_PRIMITIVE_TOPOLOGY_TYPE value to be used when creating the PSO.
    D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE
    
    # NumRenderTargets: The number of formats in the upcoming RTVFormats section.
    1
    
    # RTVFormats: An array of DXGI_FORMAT-typed values for the render target formats.
    # The number of items in the array should match the above NumRenderTargets section.
    { DXGI_FORMAT_R8G8B8A8_UNORM }

    You can generate a template .gpso file and then edit it manually to match your pipeline by running:
    rga -s dx12 --gpso-template "full path to output file"

Example

In our following example we will use the D3D12HelloTriangle sample from Microsoft’s DirectX Graphics Samples. The pipeline has two very simple shaders, both defined in shaders.hlsl : VSMain is the vertex shader and PSMain is the pixel shader.

Let’s start by generating a template .gpso file:

rga -s dx12 --gpso-template C:\shaders\hellotriangle.gpso

Now, we will tweak the file’s contents to match our source code. Let’s have a look at D3D12HelloTriangle.cpp where we can find the input layout definition:


// Define the vertex input layout.
D3D12_INPUT_ELEMENT_DESC inputElementDescs[] =
{
    { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
    { "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }
};

Let’s copy the two input layout lines under the InputLayout section and adjust the InputLayoutNumElements value to 2.

Now, another quick look at the .cpp file shows that there is a single render target with a format of DXGI_FORMAT_R8G8B8A8_UNORM :


psoDesc.NumRenderTargets = 1;
psoDesc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM;

Let’s update the NumRenderTargets and RTVFormats sections accordingly, so we would end up with a .gpso file that looks like this:


# schemaVersion
1.0

# InputLayoutNumElements: Number of D3D12_INPUT_ELEMENT_DESC elements in the D3D12_INPUT_LAYOUT_DESC structure.  
# Must match the following "InputLayout" section.
2

# InputLayout 
# { SemanticName, SemanticIndex, Format, InputSlot, AlignedByteOffset, InputSlotClass, InstanceDataStepRate } 
{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
{ "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }

# PrimitiveTopologyType: The D3D12_PRIMITIVE_TOPOLOGY_TYPE value to be used when creating the PSO.
D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE

# NumRenderTargets: The number of formats in the upcoming RTVFormats section.
1

# RTVFormats: An array of DXGI_FORMAT-typed values for the render target formats.
# The number of items in the array should match the above NumRenderTargets section.
{ DXGI_FORMAT_R8G8B8A8_UNORM }

All we have to do now is run the RGA command line tool with the following command:


rga -s dx12 --vs C:\shaders\shaders.hlsl --ps C:\shaders\shaders.hlsl --vs-model "vs_6_0" --ps-model "ps_6_0" 
    --vs-entry VSMain --ps-entry PSMain --isa C:\output\isa.txt --rs-bin C:\RootSignatures\hellotriangle.rs.fxo 
    --gpso C:\shaders\hellotriangle.gpso

Where --rs-bin points to the pre-compiled root signature binary file. For more information about root signatures in RGA, see our previous article.

Since both the vertex and pixel shaders are defined in the same file and use the same shader model, we can use the --all-hlsl and --all-model options to make our command a bit less verbose:


rga -s dx12 --all-hlsl C:\shaders\shaders.hlsl --all-model "6_0" --vs-entry VSMain --ps-entry PSMain 
--isa C:\output\isa.txt --rs-bin C:\RootSignatures\hellotriangle.rs.fxo --gpso C:\shaders\hellotriangle.gpso

That’s it. After a successful build, we get the disassembly in the output folder:


; -------- Disassembly --------------------
shader main
  asic(GFX10)
  type(PS)
  sgpr_count(6)
  vgpr_count(8)
  wave_size(64)

  s_inst_prefetch  0x0003                               // 000000000000: BFA00003
  s_mov_b32     m0, s2                                  // 000000000004: BEFC0302
  v_interp_p1_f32  v2, v0, attr0.x                      // 000000000008: C8080000
  v_interp_p1_f32  v3, v0, attr0.y                      // 00000000000C: C80C0100
  v_interp_p1_f32  v4, v0, attr0.z                      // 000000000010: C8100200
  v_interp_p1_f32  v0, v0, attr0.w                      // 000000000014: C8000300
  v_interp_p2_f32  v2, v1, attr0.x                      // 000000000018: C8090001
  v_interp_p2_f32  v3, v1, attr0.y                      // 00000000001C: C80D0101
  v_interp_p2_f32  v4, v1, attr0.z                      // 000000000020: C8110201
  v_interp_p2_f32  v0, v1, attr0.w                      // 000000000024: C8010301
  v_cvt_pkrtz_f16_f32  v2, v2, v3                       // 000000000028: 5E040702
  v_cvt_pkrtz_f16_f32  v3, v4, v0                       // 00000000002C: 5E060104
  exp           mrt0, v2, v2, v3, v3 done compr vm      // 000000000030: F8001C0F 00000302
  s_endpgm                                              // 000000000038: BF810000
  s_code_end                                            // 00000000003C: BF9F0000

In addition to the --isa option that generates the disassembly, you can use the -a option that generates the hardware resource usage statistics for each shader in the pipeline, or the --livereg option that creates a live VGPR analysis report based on the generated disassembly.

For more information about the available options, run rga -s dx12 -h .

Acknowledgements

Code samples used herein are from Microsoft’s DirectX Graphics Samples and are © Microsoft 2015 and subject to the MIT License.

Resources

Radeon™ GPU Analyzer

Radeon GPU Analyzer is an offline compiler and performance analysis tool for DirectX®, Vulkan®, SPIR-V™, OpenGL® and OpenCL™.