Radeon GPU Profiler


Radeon GPU Profiler

Incredible game experience is not a given. It is a result of game designers carefully engineering each scene and each frame to deliver the best performance out of the hardware it runs on. Meet the Radeon GPU Profiler, a ground-breaking low-level optimization tool that provides detailed timing and occupancy information on Radeon GPUs.

Unlike the black box approach of the past, PC game developers now have unprecedented, in-depth access to a GPU and can easily analyze graphics, async compute usage, event timing, pipeline stalls, barriers, bottlenecks and other performance inefficiencies.

This unique tool generates easy to understand visualizations of how your DirectX®12, Vulkan®, and OpenCL™ applications interact with the GPU at the hardware level. Profiling a game is both a quick and simple process using the Radeon Developer Panel and our public GPU driver.

Radeon GPU Profiler is provided as a binary package and can be downloaded as a GitHub release at this link.

Key Features

Low level GPU timing data for:

  • Barriers
  • Queue signals and waits
  • Wavefront occupancy
  • Context roll stalls
  • Event timings
  • Pipeline state
  • Instruction timing

Supported GPUs

  • Tonga R9 285, R9 380
  • Radeon RX 400 and RX 500
  • Radeon R9 Fury, Fury X and Fury Nano
  • Ryzen 5 2400G and Ryzen 3 2200G Processors with Radeon Vega Graphics
  • Radeon RX Vega
  • Radeon VII
  • Radeon RX 5700 and RX 5700 XT
  • Radeon RX 5500 series and RX 5300 series

Supported graphics APIs

  • DirectX 12
  • Vulkan

Supported compute APIs

  • OpenCL

Supported OSs

  • Windows 7
  • Windows 10
  • Linux – Ubuntu 18.04.3 LTS


Wavefront Occupancy

The wavefront occupancy view displays how many wavefronts were pushed through the GPU. This indicates how close we got to the theoretical maximum, and applies to both graphics and async compute wavefronts. We can also correlate between wavefronts and the GPU events which launched them. The data displayed in this view is highly filterable, groupable, and includes a side panel with added detail about user selections.

Frame Summary

The frame summary view provides a bird’s eye view of how command buffers got submitted to each GPU queue. This includes graphics, async compute, and copy queues. This allows users to understand how pegged the GPU was over the course of a few frames, and also see command buffers executing in parallel across queues. Synchronization objects (signal/wait) are also included to aid with cross-queue synchronization.


The barriers view indicates how expensive each barrier is from a GCN perspective. We gain insight on which parts of the pipeline got flushed, which caches got invalidated, whether decompresses were triggered, barrier durations, and whether the driver had to inject additional barriers.

Most Expensive Events

The most expensive events view is a quick way to determine which GPU events are consuming the most frame time. It displays durations, which pipeline stages were active, and shows user event strings to help pinpoint the work being performed.

Context Rolls

The context rolls view helps users understand the cost of stalls due to pipeline state changes. This includes an analysis of context pressure, identification of which draws were responsible for each context roll, and how redundant pipeline state changes were.

Event Timing

The event timing pane shows a listing of all GPU events in the frame. Similar to an API trace, but tailored to only events which consumed GPU time. This view can also group its events via different mechanisms (i.e. by command buffer, user marker, etc) and displays a duration for each event on the side.

Pipeline State

The pipeline state view shows the same list of GPU events that is found under event timing and adds an interactive graphics pipeline on the right. Users may inspect the state of the hardware for both fixed function stages and programmable shader stages. This also provides insight on how to reduce GPR pressure to achieve higher wavefront occupancy.

Instruction Timing

The instruction timing view shows the average issue latency of each instruction of a single shader. The Instruction Timing information is generated using hardware support on AMD GPUs. Generating Instruction Timing does not require recompilation of shaders or insertion of any instrumentation into shaders.

Render Targets

The render and depth targets view exposes all targets that were written to in the frame. This includes a timeline that lays out each target as it gets written to by the GPU, plus related workloads including compute, clears, and barriers. Beneath the timeline is a listing of all targets that shows their characteristics.


The pipelines view shows all pipelines used by the frame. Information about each pipeline is shown, including detailed information about the shaders contained in the pipeline and the events which use the pipeline.

Renderdoc/RGP Interop

RGP features interoperability with RenderDoc, the widely-used 3rd party frame debugger. We have augmented RenderDoc with the ability to generate RGP profiles, allowing users to use both tools side by side. This includes under-the-hood event mapping mechanisms that allow users to auto-magically jump between events across both tools. The end result is an understanding the correlation between low-level GPU work and the final rendered scene.

Technical Blogs

GPUOpen technical blogs about the Radeon GPU Profiler

How-To Video


Optimization with Radeon GPU Profiler