A key difference between the new DirectX 12 mode (-s dx12
) and the older DirectX 11 mode (-s dx11
, previously named-s hlsl
) is that the DirectX12 mode uses the live driver and follows the same compilation path as a real-world DirectX12 application. With that comes the power of generating disassembly and hardware resource usage statistics that are closest to the real-world case, and therefore making better performance optimization decisions.
To compile a DirectX12 graphics pipeline, you would need to provide the following inputs to the tool, in addition to the HLSL source files:
- Root signature: The root signature can be either defined in the HLSL source code or provided in a pre-compiled binary file, as described in our previous article.
-
.gpso
file: For compute pipelines, the HLSL source code, together with a valid root signature, are enough for performing a successful compilation of the pipeline. For graphics, however, a subset of the D3D12 graphics pipeline state is required as well. Without that additional data, RGA would not be able to properly set the pipeline state for your shaders and this would result in a compilation failure. The subset of the graphics pipeline state that RGA requires is defined in a custom.gpso
file of the following format:# schemaVersion 1.0 # InputLayoutNumElements: Number of D3D12_INPUT_ELEMENT_DESC elements in the D3D12_INPUT_LAYOUT_DESC structure. # Must match the following "InputLayout" section. 2 # InputLayout # { SemanticName, SemanticIndex, Format, InputSlot, AlignedByteOffset, InputSlotClass, InstanceDataStepRate } { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }, { "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 } # PrimitiveTopologyType: The D3D12_PRIMITIVE_TOPOLOGY_TYPE value to be used when creating the PSO. D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE # NumRenderTargets: The number of formats in the upcoming RTVFormats section. 1 # RTVFormats: An array of DXGI_FORMAT-typed values for the render target formats. # The number of items in the array should match the above NumRenderTargets section. { DXGI_FORMAT_R8G8B8A8_UNORM }
You can generate a template
.gpso
file and then edit it manually to match your pipeline by running:
rga -s dx12 --gpso-template "full path to output file"
Example
In our following example we will use the D3D12HelloTriangle sample from Microsoft’s DirectX Graphics Samples. The pipeline has two very simple shaders, both defined in shaders.hlsl
: VSMain is the vertex shader and PSMain is the pixel shader.
Let’s start by generating a template .gpso
file:
rga -s dx12 --gpso-template C:\shaders\hellotriangle.gpso
Now, we will tweak the file’s contents to match our source code. Let’s have a look at D3D12HelloTriangle.cpp
where we can find the input layout definition:
// Define the vertex input layout.
D3D12_INPUT_ELEMENT_DESC inputElementDescs[] =
{
{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
{ "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }
};
Let’s copy the two input layout lines under the InputLayout section and adjust the InputLayoutNumElements value to 2.
Now, another quick look at the .cpp
file shows that there is a single render target with a format of DXGI_FORMAT_R8G8B8A8_UNORM
:
psoDesc.NumRenderTargets = 1;
psoDesc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM;
Let’s update the NumRenderTargets and RTVFormats sections accordingly, so we would end up with a .gpso
file that looks like this:
# schemaVersion
1.0
# InputLayoutNumElements: Number of D3D12_INPUT_ELEMENT_DESC elements in the D3D12_INPUT_LAYOUT_DESC structure.
# Must match the following "InputLayout" section.
2
# InputLayout
# { SemanticName, SemanticIndex, Format, InputSlot, AlignedByteOffset, InputSlotClass, InstanceDataStepRate }
{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
{ "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }
# PrimitiveTopologyType: The D3D12_PRIMITIVE_TOPOLOGY_TYPE value to be used when creating the PSO.
D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE
# NumRenderTargets: The number of formats in the upcoming RTVFormats section.
1
# RTVFormats: An array of DXGI_FORMAT-typed values for the render target formats.
# The number of items in the array should match the above NumRenderTargets section.
{ DXGI_FORMAT_R8G8B8A8_UNORM }
All we have to do now is run the RGA command line tool with the following command:
rga -s dx12 --vs C:\shaders\shaders.hlsl --ps C:\shaders\shaders.hlsl --vs-model "vs_6_0" --ps-model "ps_6_0"
--vs-entry VSMain --ps-entry PSMain --isa C:\output\isa.txt --rs-bin C:\RootSignatures\hellotriangle.rs.fxo
--gpso C:\shaders\hellotriangle.gpso
Where --rs-bin
points to the pre-compiled root signature binary file. For more information about root signatures in RGA, see our previous article.
Since both the vertex and pixel shaders are defined in the same file and use the same shader model, we can use the --all-hlsl
and --all-model
options to make our command a bit less verbose:
rga -s dx12 --all-hlsl C:\shaders\shaders.hlsl --all-model "6_0" --vs-entry VSMain --ps-entry PSMain
--isa C:\output\isa.txt --rs-bin C:\RootSignatures\hellotriangle.rs.fxo --gpso C:\shaders\hellotriangle.gpso
That’s it. After a successful build, we get the disassembly in the output folder:
; -------- Disassembly --------------------
shader main
asic(GFX10)
type(PS)
sgpr_count(6)
vgpr_count(8)
wave_size(64)
s_inst_prefetch 0x0003 // 000000000000: BFA00003
s_mov_b32 m0, s2 // 000000000004: BEFC0302
v_interp_p1_f32 v2, v0, attr0.x // 000000000008: C8080000
v_interp_p1_f32 v3, v0, attr0.y // 00000000000C: C80C0100
v_interp_p1_f32 v4, v0, attr0.z // 000000000010: C8100200
v_interp_p1_f32 v0, v0, attr0.w // 000000000014: C8000300
v_interp_p2_f32 v2, v1, attr0.x // 000000000018: C8090001
v_interp_p2_f32 v3, v1, attr0.y // 00000000001C: C80D0101
v_interp_p2_f32 v4, v1, attr0.z // 000000000020: C8110201
v_interp_p2_f32 v0, v1, attr0.w // 000000000024: C8010301
v_cvt_pkrtz_f16_f32 v2, v2, v3 // 000000000028: 5E040702
v_cvt_pkrtz_f16_f32 v3, v4, v0 // 00000000002C: 5E060104
exp mrt0, v2, v2, v3, v3 done compr vm // 000000000030: F8001C0F 00000302
s_endpgm // 000000000038: BF810000
s_code_end // 00000000003C: BF9F0000
In addition to the --isa
option that generates the disassembly, you can use the -a
option that generates the hardware resource usage statistics for each shader in the pipeline, or the --livereg
option that creates a live VGPR analysis report based on the generated disassembly.
For more information about the available options, run rga -s dx12 -h
.
Acknowledgements
Code samples used herein are from Microsoft’s DirectX Graphics Samples and are © Microsoft 2015 and subject to the MIT License.
Resources
Radeon™ GPU Analyzer
Radeon GPU Analyzer is an offline compiler and performance analysis tool for DirectX®, Vulkan®, SPIR-V™, OpenGL® and OpenCL™.
Using Radeon™ GPU Analyzer with Direct3D®12 Compute
Radeon GPU Analyzer (RGA) has support for DirectX12 compute shaders with the command line tool. This mode can generate GCN/RDNA ISA disassembly for your compute shaders, regardless of the physically installed GPU.
Radeon™ GPU Analyzer – Visual Studio® Code Extension
This is a Visual Studio® Code extension for Radeon GPU Analyzer (RGA) to allow you to use RGA directly from within VS Code.