Background
DirectX®12 requires complete pipeline state definition to compile a pipeline. This involves locating all the pipeline’s shaders, defining a root signature, and, for graphics, defining a subset of the graphics pipeline state. The need to prepare the entire graphics or compute pipeline elements upfront made the offline compilation process of DirectX12 shaders somewhat tedious. This approach could be cumbersome, particularly in scenarios where users want to compile a single shader in isolation.
RGA v2.9.1 to the rescue
RGA v2.9.1 streamlines the shader compilation experience by allowing you to compile a single D3D12 shader. When an incomplete DirectX®12 pipeline is given, RGA v2.9.1 will autogenerate the missing elements of the pipeline for you. These elements can be the root signature, the graphics pipeline state subset or even shaders in the pipeline. This feature essentially makes any input beyond the single shader that you would like to compile optional.
Usage example
Consider the following pixel shader:
Normally, to compile this pixel shader, you would have had to define the entire graphics pipeline state: the accompanying vertex shader, a root signature and the subset of the graphics pipeline state.
With RGA v2.9.1, you can compile that pixel shader in isolation. In terms of the command line invocation, there is no change in how you use RGA. You will use the same RGA DirectX®12 command as before, while omitting the missing pieces of the D3D12 graphics pipeline. In the example below, the pixel shader is being compiled for AMD Radeon RX 7000 series (RDNA 3 architecture) GPU:
This produces the following output:
Building for gfx1100...
Auto-generating root signature using reflection into C:\RGA-2.9.1\Generated\rga_autogen_20240415_174858_.rootsig ... success.
Auto-generating graphics pipeline state using reflection into C:\RGA-2.9.1\Generated\rga_autogen_20240415_174858_.gpso ... success.
Auto-generating vertex shader using reflection into C:\RGA-2.9.1\Generated\rga_autogen_20240415_174858_vs.hlsl ... success.
Performing front-end compilation of vertex shader through DXC...
Front-end compilation success.
Performing front-end compilation of pixel shader through DXC...
Front-end compilation success.
Performing front-end compilation of root signature through DXC...
Front-end compilation success.
Compiling graphics pipeline...
Extracting vertex shader disassembly...
vertex shader disassembly extracted successfully.
Extracting pixel shader disassembly...
pixel shader disassembly extracted successfully.
succeeded.
RGA will detect the parts of the graphics pipeline as missing and auto-generate them via reflection.
A dedicated command line argument, --autogen-dir <folder>
, has been introduced, which allows you to specify a folder in which auto-generated files will be stored. By default, these files are deleted after compilation unless otherwise specified.
In our example, RGA will automatically generate a vertex shader, a textual representation of the root signature and a .gpso file containing the subset of the graphics pipeline state. The textual representation of the root signature allows you to investigate compilation issues. It also allows you to easily tweak the auto-generated files and recompile.
RGA uses reflection to ensure that all the files it auto-generates will match your input pixel shader in terms of vertex attributes (e.g., vertex format, vertex attributes to interpolate, render targets) and resource bindings (buffers and textures used by the shader).
Auto-generated HLSL Vertex Shader:
// Auto-generated with Radeon GPU Analyzer (RGA).
struct VsInput
{
float4 attribute0: POSITION0;
};
struct VsOutput
{
float4 attribute0: SV_POSITION;
float2 attribute1: TEXCOORD0;
};
void main(VsInput input, out VsOutput output)
{
float4 result = float4(0.0, 0.0, 0.0, 0.0);
result += float4(input.attribute0.xyzw);
output.attribute0 = float4(result.xyzw);
output.attribute1 = float2(result.xy);
}
Auto-generated text-based Root Signature:
// Auto-generated with Radeon GPU Analyzer (RGA).
#define RGA_ROOT_SIGNATURE \
"RootFlags( ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT | DENY_HULL_SHADER_ROOT_ACCESS " \
"| DENY_DOMAIN_SHADER_ROOT_ACCESS | DENY_GEOMETRY_SHADER_ROOT_ACCESS ), " \
"DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL), " \
"DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL)"
.gpso file having the graphics pipeline state object:
# Auto-generated with Radeon GPU Analyzer (RGA).
# schemaVersion
1.0
# InputLayoutNumElements (the number of D3D12_INPUT_ELEMENT_DESC elements in the D3D12_INPUT_LAYOUT_DESC structure - must match the following "InputLayout" section)
1
# InputLayout ( {SemanticName, SemanticIndex, Format, InputSlot, AlignedByteOffset, InputSlotClass, InstanceDataStepRate } )
{ "POSITION", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }
# PrimitiveTopologyType (the D3D12_PRIMITIVE_TOPOLOGY_TYPE value to be used when creating the PSO)
D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE
# NumRenderTargets (the number of formats in the upcoming RTVFormats section)
1
# RTVFormats (an array of DXGI_FORMAT-typed values for the render target formats - the number of items in the array should match the above NumRenderTargets section)
{ DXGI_FORMAT_R8G8B8A8_UNORM }
Once the missing pieces of the D3D12 graphics pipeline are auto-generated, RGA invokes the AMD Shader compiler passing in the pixel shader along with those files to compile the entire pipeline.
Upon successful compilation, you get the relevant pixel shader disassembly:
; D3D12 Shader Hash 0x30d77570b6e44f6c49553fb9ca32e72d
; API PSO Hash 0xd5b60f61c55df988
; Driver Internal Pipeline Hash 0xf9f385166a76e0d7
; -------- Disassembly --------------------
shader main
asic(GFX11)
type(PS)
sgpr_count(14)
vgpr_count(8)
wave_size(64)
// s_ps_state in s0
s_version UC_VERSION_GFX11 | UC_VERSION_W64_BIT // 000000000000: B0802006
s_set_inst_prefetch_distance 0x0003 // 000000000004: BF840003
s_mov_b32 m0, s4 // 000000000008: BEFD0004
s_mov_b64 s[12:13], exec // 00000000000C: BE8C017E
s_wqm_b64 exec, exec // 000000000010: BEFE1D7E
s_getpc_b64 s[0:1] // 000000000014: BE804780
s_waitcnt_depctr depctr_vm_vsrc(0) & depctr_va_vdst(0) // 000000000018: BF880F83
lds_param_load v2, attr0.x wait_vdst:0 // 00000000001C: CE000002
lds_param_load v3, attr0.y wait_vdst:0 // 000000000020: CE000103
s_mov_b32 s4, s3 // 000000000024: BE840003
s_mov_b32 s5, s1 // 000000000028: BE850001
s_mov_b32 s0, s2 // 00000000002C: BE800002
s_load_b256 s[4:11], s[4:5], null // 000000000030: F40C0102 F8000000
s_load_b128 s[0:3], s[0:1], null // 000000000038: F4080000 F8000000
v_interp_p10_f32 v4, v2, v0, v2 wait_exp:1 // 000000000040: CD000104 040A0102
v_interp_p10_f32 v0, v3, v0, v3 wait_exp:0 // 000000000048: CD000000 040E0103
s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2) // 000000000050: BF870112
v_interp_p2_f32 v2, v2, v1, v4 wait_exp:7 // 000000000054: CD010702 04120302
v_interp_p2_f32 v0, v3, v1, v0 wait_exp:7 // 00000000005C: CD010700 04020303
s_and_b64 exec, exec, s[12:13] // 000000000064: 8BFE0C7E
s_waitcnt lgkmcnt(0) // 000000000068: BF89FC07
image_sample v[0:3], [v2,v0], s[4:11], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D // 00000000006C: F06C0F05 00010002 00000000
s_waitcnt vmcnt(0) // 000000000078: BF8903F7
v_cvt_pk_rtz_f16_f32 v0, v0, v1 // 00000000007C: 5E000300
v_cvt_pk_rtz_f16_f32 v2, v2, v3 // 000000000080: 5E040702
s_mov_b64 exec, s[12:13] // 000000000084: BEFE010C
exp mrt0, v0, v2, off, off done // 000000000088: F8000803 00000200
s_endpgm // 000000000090: BFB00000
s_code_end // 000000000094: BF9F0000
s_code_end // 000000000098: BF9F0000
s_code_end // 00000000009C: BF9F0000
s_code_end // 0000000000A0: BF9F0000
end
Conclusion
In summary, RGA v2.9.1 simplifies DirectX®12 offline shader compilation and analysis and makes it easier for you to quickly investigate single shaders.
Get the Radeon Developer Tool Suite today!
You can find out more about RGA, including links to the release binaries on GitHub and the full release notes list, on our product page.
Your feedback is incredibly valuable to us and helps drive the RGA roadmap. For feature requests or feedback, get in touch on GitHub!