Live VGPR Analysis with Radeon™ GPU Analyzer

This tutorial explains how to use Radeon GPU Analyzer (RGA) to produce a live VGPR analysis report for your shaders and kernels. Basic RGA usage knowledge is assumed.

Motivation

By performing a live register analysis on your shaders and kernels, you can identify code blocks with higher VGPR pressure, and opportunities for register usage optimizations.

Background

The live register analysis determines “live” registers, that is, all registers which contain values that will be consumed by subsequent instructions. The maximum number of live registers is thus the lower bound on how many registers need to be allocated.
The analysis computes the live register set by building the control flow graph directly from the ISA disassembly, and propagating the read/write information through it. Every read is propagated “up” through the control flow graph until a write is encountered. This produces the live range, which starts with a write, and ends with a read instruction.

Usage

To generate a live VGPR analysis report for any type of shader or kernel, add the –livereg switch to your command. Make sure that your command contains the –isa switch, because the live register report is being generated by processing the GCN ISA disassembly. Without using –isa, there will be no GCN ISA disassembly to analyze.

Example

The following command will generate a live VGPR analysis report for a Vulkan™ vertex shader:
rga -s vulkan --vert ~/Vertex1.vert --isa ~/isa_output.txt --livereg ~/livereg_report.txt

Output Interpretation

Let’s take a look at the following live register analysis report, which was generated for a DirectX®11 vertex shader:

    1 |   9 |     ::::::: :: | label_basic_block_1: s_swappc_b64 s[2:3], s[2:3]
    2 |   9 |     ::::::: :: | s_andn2_b32 s0, s9, 0x3fff0000
    3 |   9 |     ::::::: :: | s_mov_b32 s1, s0
    4 |   9 |     ::::::: :: | s_mov_b32 s2, s10
    5 |   9 |     ::::::: :: | s_mov_b32 s3, s11
    6 |   9 |     ::::::: :: | s_mov_b32 s0, s8
    7 |   9 |     ::::::: :: | s_buffer_load_dwordx8 s[4:11], s[0:3], 0x00
    8 |   9 |     ::::::: :: | s_buffer_load_dwordx8 s[12:19], s[0:3], 0x20
    9 |   9 |     ::::::: :: | s_waitcnt lgkmcnt(0)
   10 |  10 | ^   v:::::: :: | v_mul_f32 v0, s4, v4
   11 |  11 | :^  v:::::: :: | v_mul_f32 v1, s8, v4
   12 |  12 | ::^ v:::::: :: | v_mul_f32 v2, s12, v4
   13 |  13 | :::^v:::::: :: | v_mul_f32 v3, s16, v4
   14 |  12 | x::: v::::: :: | v_mac_f32 v0, s5, v5
   15 |  12 | :x:: v::::: :: | v_mac_f32 v1, s9, v5
   16 |  12 | ::x: v::::: :: | v_mac_f32 v2, s13, v5
   17 |  12 | :::x v::::: :: | v_mac_f32 v3, s17, v5
   18 |  11 | x:::  v:::: :: | v_mac_f32 v0, s6, v6
   19 |  11 | :x::  v:::: :: | v_mac_f32 v1, s10, v6
   20 |  11 | ::x:  v:::: :: | v_mac_f32 v2, s14, v6
   21 |  11 | :::x  v:::: :: | v_mac_f32 v3, s18, v6
   22 |  10 | x:::   v::: :: | v_mac_f32 v0, s7, v7
   23 |  10 | :x::   v::: :: | v_mac_f32 v1, s11, v7
   24 |  10 | ::x:   v::: :: | v_mac_f32 v2, s15, v7
   25 |  10 | :::x   v::: :: | v_mac_f32 v3, s19, v7
   26 |   9 | vvvv    ::: :: | exp pos0, v0, v1, v2, v3
   27 |   5 |         ::: :: | s_buffer_load_dwordx4 s[4:7], s[0:3], 0x40
   28 |   5 |         ::: :: | s_buffer_load_dwordx4 s[8:11], s[0:3], 0x50
   29 |   5 |         ::: :: | s_buffer_load_dwordx4 s[0:3], s[0:3], 0x60
   30 |   5 |         ::: :: | s_waitcnt expcnt(0)
   31 |   6 | ^       v:: :: | v_mul_f32 v0, s4, v8
   32 |   7 | :^      v:: :: | v_mul_f32 v1, s8, v8
   33 |   8 | ::^     v:: :: | v_mul_f32 v2, s0, v8
   34 |   7 | x::      v: :: | v_mac_f32 v0, s5, v9
   35 |   7 | :x:      v: :: | v_mac_f32 v1, s9, v9
   36 |   7 | ::x      v: :: | v_mac_f32 v2, s1, v9
   37 |   6 | x::       v :: | v_mac_f32 v0, s6, v10
   38 |   6 | :x:       v :: | v_mac_f32 v1, s10, v10
   39 |   6 | ::x       v :: | v_mac_f32 v2, s2, v10
   40 |   5 | vvv         :: | exp param0, v0, v1, v2, off
   41 |   2 |             vv | exp param1, v12, v13, off, off
   42 |   0 |                | s_endpgm 

Maximum # VGPR used  13, # VGPR allocated:  14

Report structure:

  • First (leftmost) column: a running number which represents the code line number
  • Second column: the number of live VGPRs at that point of the program’s execution
  • Third column: symbols that represent the status of each register. The i’th symbol refers to the i’th register:
    • ‘:’ means that the register is kept alive, while it is not actively being used by the current instruction
    • ‘^’ means that the current instruction writes to the register
    • ‘v’ means that the current instruction reads from the register
    • ‘x’ means that the current instruction both reads from the register and writes to it
  • Fourth column: the disassembly of the current instruction
  • The bottom line of the report presents a summary of the number of VGPRs which were actually used by the shader, and the number of VGPRs which were allocated for it.

Remarks

  • The analysis takes branches in the code into account, and assumes that either way can be taken. In those cases, the live registers appear “out of nowhere” at a label. This is by-design.
  • The analysis only looks at VGPRs, not SGPRs. Many instructions will consume scalar registers, those are ignored as there’s generally more than enough scalar registers, and scalar registers are not the limiting factor for occupancy on GCN.
  • Some registers will appear live when the program starts – these are generally pre-loaded, for instance, in a vertex shader, the fetch shader will load data into registers before the shader starts.

Contributing to RGA

The source code for RGA’s live register analysis engine can be found on GitHub with the link below.

Resources

AMD GPU Services (AGS) Library

The AMD GPU Services (AGS) library provides software developers with the ability to query AMD GPU software and hardware state information that is not normally available through standard operating systems or graphics APIs.

RGA

Radeon™ GPU Analyzer

Radeon GPU Analyzer is an offline compiler and performance analysis tool for DirectX®, Vulkan®, SPIR-V™, OpenGL® and OpenCL™.

Amit Ben-Moshe

Amit Ben-Moshe

Amit Ben-Moshe is a Technical Lead and a Principal Member of Technical Staff at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

Enjoy this blog post? If you found it useful, why not share it with other game developers?

You may also like...

Getting started: our software

New or fairly new to AMD’s tools, libraries, and effects? This is the best place to get started on GPUOpen!

Getting started: development and performance

Looking for tips on getting started with developing and/or optimizing your game, whether on AMD hardware or generally? We’ve got you covered!

If slide decks are what you’re after, you’ll find 100+ of our finest presentations here. Plus there’s a handy list of our product manuals!

Developer guides

Browse our developer guides, and find valuable advice on developing with AMD hardware, ray tracing, Vulkan, DirectX, UE4, and lots more.

Words not enough? How about pictures? How about moving pictures? We have some amazing videos to share with you!

The home of great performance and optimization advice for AMD RDNA™ 2 GPUs, AMD Ryzen™ CPUs, and so much more.

Product Blogs

Our handy product blogs will help you make good use of our tools, SDKs, and effects, as well as sharing the latest features with new releases.

Publications

Discover our published publications.