
The AMD GPU Services (AGS) library provides software developers with the ability to query AMD GPU software and hardware state information that is not normally available through standard operating systems or graphics APIs.
This tutorial explains how to use Radeon GPU Analyzer (RGA) to produce a live VGPR analysis report for your shaders and kernels. Basic RGA usage knowledge is assumed.
By performing a live register analysis on your shaders and kernels, you can identify code blocks with higher VGPR pressure, and opportunities for register usage optimizations.
The live register analysis determines “live” registers, that is, all registers which contain values that will be consumed by subsequent instructions. The maximum number of live registers is thus the lower bound on how many registers need to be allocated.
The analysis computes the live register set by building the control flow graph directly from the ISA disassembly, and propagating the read/write information through it. Every read is propagated “up” through the control flow graph until a write is encountered. This produces the live range, which starts with a write, and ends with a read instruction.
To generate a live VGPR analysis report for any type of shader or kernel, add the –livereg switch to your command. Make sure that your command contains the –isa switch, because the live register report is being generated by processing the GCN ISA disassembly. Without using –isa, there will be no GCN ISA disassembly to analyze.
The following command will generate a live VGPR analysis report for a Vulkan™ vertex shader:
rga -s vulkan —vert ~/Vertex1.vert —isa ~/isa_output.txt —livereg ~/livereg_report.txt
Let’s take a look at the following live register analysis report, which was generated for a DirectX®11 vertex shader:
1 | 9 | ::::::: :: | label_basic_block_1: s_swappc_b64 s[2:3], s[2:3] 2 | 9 | ::::::: :: | s_andn2_b32 s0, s9, 0x3fff0000 3 | 9 | ::::::: :: | s_mov_b32 s1, s0 4 | 9 | ::::::: :: | s_mov_b32 s2, s10 5 | 9 | ::::::: :: | s_mov_b32 s3, s11 6 | 9 | ::::::: :: | s_mov_b32 s0, s8 7 | 9 | ::::::: :: | s_buffer_load_dwordx8 s[4:11], s[0:3], 0x00 8 | 9 | ::::::: :: | s_buffer_load_dwordx8 s[12:19], s[0:3], 0x20 9 | 9 | ::::::: :: | s_waitcnt lgkmcnt(0) 10 | 10 | ^ v:::::: :: | v_mul_f32 v0, s4, v4 11 | 11 | :^ v:::::: :: | v_mul_f32 v1, s8, v4 12 | 12 | ::^ v:::::: :: | v_mul_f32 v2, s12, v4 13 | 13 | :::^v:::::: :: | v_mul_f32 v3, s16, v4 14 | 12 | x::: v::::: :: | v_mac_f32 v0, s5, v5 15 | 12 | :x:: v::::: :: | v_mac_f32 v1, s9, v5 16 | 12 | ::x: v::::: :: | v_mac_f32 v2, s13, v5 17 | 12 | :::x v::::: :: | v_mac_f32 v3, s17, v5 18 | 11 | x::: v:::: :: | v_mac_f32 v0, s6, v6 19 | 11 | :x:: v:::: :: | v_mac_f32 v1, s10, v6 20 | 11 | ::x: v:::: :: | v_mac_f32 v2, s14, v6 21 | 11 | :::x v:::: :: | v_mac_f32 v3, s18, v6 22 | 10 | x::: v::: :: | v_mac_f32 v0, s7, v7 23 | 10 | :x:: v::: :: | v_mac_f32 v1, s11, v7 24 | 10 | ::x: v::: :: | v_mac_f32 v2, s15, v7 25 | 10 | :::x v::: :: | v_mac_f32 v3, s19, v7 26 | 9 | vvvv ::: :: | exp pos0, v0, v1, v2, v3 27 | 5 | ::: :: | s_buffer_load_dwordx4 s[4:7], s[0:3], 0x40 28 | 5 | ::: :: | s_buffer_load_dwordx4 s[8:11], s[0:3], 0x50 29 | 5 | ::: :: | s_buffer_load_dwordx4 s[0:3], s[0:3], 0x60 30 | 5 | ::: :: | s_waitcnt expcnt(0) 31 | 6 | ^ v:: :: | v_mul_f32 v0, s4, v8 32 | 7 | :^ v:: :: | v_mul_f32 v1, s8, v8 33 | 8 | ::^ v:: :: | v_mul_f32 v2, s0, v8 34 | 7 | x:: v: :: | v_mac_f32 v0, s5, v9 35 | 7 | :x: v: :: | v_mac_f32 v1, s9, v9 36 | 7 | ::x v: :: | v_mac_f32 v2, s1, v9 37 | 6 | x:: v :: | v_mac_f32 v0, s6, v10 38 | 6 | :x: v :: | v_mac_f32 v1, s10, v10 39 | 6 | ::x v :: | v_mac_f32 v2, s2, v10 40 | 5 | vvv :: | exp param0, v0, v1, v2, off 41 | 2 | vv | exp param1, v12, v13, off, off 42 | 0 | | s_endpgm
Maximum # VGPR used 13, # VGPR allocated: 14
Report structure:
Remarks
Contributing to RGA
The source code for RGA’s live register analysis engine can be found on GitHub with the link below.
The AMD GPU Services (AGS) library provides software developers with the ability to query AMD GPU software and hardware state information that is not normally available through standard operating systems or graphics APIs.
Radeon GPU Analyzer is an offline compiler and performance analysis tool for DirectX®, Vulkan®, SPIR-V™, OpenGL® and OpenCL™.
Radeon GPU Analyzer (RGA) has support for DirectX12 compute shaders with the command line tool. This mode can generate GCN/RDNA ISA disassembly for your compute shaders, regardless of the physically installed GPU.
This is a Visual Studio® Code extension for Radeon GPU Analyzer (RGA) to allow you to use RGA directly from within VS Code.