It’s time for another release of Radeon™ GPU Profiler (RGP)! 

RGP is our ground-breaking low-level optimization tool that provides detailed information on Radeon™ GPUs.

This latest release – version 1.10 – adds support for:

And there’s more too! Read on to find out.

Cache counters

While RGP has always used low level data from Radeon GPUs to give you insight into your game’s performance, we have taken this a bit further in this release. With the introduction of the cache counters, you can now see how your frame accesses the various cache levels of the GPU’s memory hierarchy.  RGP can now visualize data from the Level 0, Level 1, and Level 2 memory caches, as well as data from the shader core’s instruction and scalar caches.

Built on the GPU hardware’s streaming performance metrics (SPM), this data provides unique insight into what is happening on the hardware, and gives you the opportunity to optimize a shader’s memory access patterns to achieve higher performance.

This functionality is supported on RDNA and RDNA 2 hardware, so you’ll need to have a Radeon RX 5000 series GPU or newer. To collect this data, just make sure the Collect cache counters option is checked in Radeon Developer Panel when you capture a profile.

When you open the captured profile in RGP, the Wavefront Occupancy pane has a new UI element which visualizes the cache data. Below, you can see the five cache statistics that are collected. The graphs plot the hit percentages for each cache across the frame timeline. Using this view, you can correlate the cache statistics with both the wavefront occupancy and the event timeline.

Here’s how it looks in RGP (see the area outlined in red in the below screenshot):

Simply hover the mouse over the graph to view more details about each cache at the point in the graph under the mouse, including the hit percentage, the number of cache requests, hits, misses:

You can enable or disable the graph for any of the caches using the Counters dropdown which appears above the graphs. Here, we have unselected the Instruction and Scalar caches, so the graph will only show the L0, L1, and L2 cache counters:

You can also fill in the area under one or more of the graphs by clicking on that item’s color box in the legend below the graphs.

Here, we have clicked the purple L0 cache hit color box and the red L1 cache hit color box, to fill in the area under both line graphs:

Want a more verbose description of each statistic? Simply hover the mouse over the counter name in the legend:

Vulkan® ray tracing

This release adds support for Vulkan® ray tracing, building on the DirectX® raytracing support introduced in the 1.9 release.  In fact, the Vulkan® ray tracing support is nearly identical to the DirectX® raytracing support.  

When visualizing profile data from an application that uses the Vulkan® ray tracing API, you will see one or more of the following events:

  • vkCmdTraceRaysKHR<Indirect>
  • vkCmdTraceRaysKHR<Unified>
  • vkCmdTraceRaysIndirectKHR<Indirect>
  • vkCmdTraceRaysIndirectKHR<Unified>
  • vkCmdBuildAccelerationStructuresKHR
  • vkCmdBuildAccelerationStructuresIndirectKHR
  • vkCmdCopyAccelerationStructureKHR
  • vkCmdCopyAccelerationStructureToMemoryKHR
  • vkCmdCopyMemoryToAccelerationStructureKHR

The exact events you see will depend on how the application being profiled uses the Vulkan® ray tracing API. You can read more about the ray tracing support in this article, which also contains a description of the difference between the <Indirect> and <Unified> variants of vkCmdTraceRaysKHR and vkCmdTraceRaysIndirectKHR .

Other updates

There have also been several other minor updates:

  • The events table in the Most expensive events pane now has an extra column showing the Work duration. The Work duration (which is also shown in the Details panel of the Wavefront occupancy and Event timing panes) indicates the amount of actual time shaders are running for an event. By showing this in the Most expensive events table, you can easily see if an event is expensive due to shaders that are running or if there is perhaps some other reason the event is taking a long time.
  • There have been some significant performance enhancements that should make the UI more responsive when viewing a large ray tracing profile.
  • We have also updated and fixed clipboard support in several parts of the RGP user interface – if you previously ran into issues copying some data from the UI to the clipboard, you may want to try again with this release.
  • The PIX3 marker support in RGP has been updated to support the latest WinPIXEventRuntime headers – please check out the latest documentation for instructions on using PIX markers with RGP.

And as is the case with all RGP releases, there have been plenty of other changes intended to improve quality in this release.

Please check out the RGP product page, where you can find links to download RGP 1.10.  And please feel free to reach out to us via the Issues section on the RGP page on GitHub.  We value all feedback provided.

Further reading

AMD Radeon GPU Profiler

Radeon™ GPU Profiler

RGP gives you unprecedented, in-depth access to a GPU. Easily analyze graphics, async compute usage, event timing, pipeline stalls, barriers, bottlenecks, and other performance inefficiencies.