| Unified Radeon™ GPU Profiler and Radeon™ Memory Visualizer usage with Radeon™ Developer Panel 2.1

Introduction to the Radeon™ Developer Panel

The Radeon™ Developer Panel (RDP) is a key component of our GPU tools strategy. RDP provides a communication channel that delivers requests to, and receives data from, the driver. 

When RDP is connected to a Radeon™ Adrenalin driver it enables developer mode which allows it to generate event timing data used by the Radeon™ GPU Profiler (RGP), and the memory usage data used by the Radeon™ Memory Visualizer (RMV).

Radeon™ Memory Visualizer

Radeon™ Memory Visualizer (RMV) is a tool to allow you to gain a deep understanding of how your application uses memory for graphics resources.

Radeon™ GPU Profiler

RGP gives you unprecedented, in-depth access to a GPU. Easily analyze graphics, async compute usage, event timing, pipeline stalls, barriers, bottlenecks, and other performance inefficiencies.

The panel can connect to a Radeon™ Adrenalin driver running on the local computer, or it can connect to a driver running on a remote computer. 

Image 1: Shows the four steps to generating RMV and RGP data from a local computer

To support remote connection a small executable called the Radeon™ Developer Service (RDS) must be running on the remote computer to provide the connection end point.

The resulting data files (profile and memory trace files) are saved to disk by RDP which then acts like a file browser that can open them in either in RMV or RGP.

Image 3: Shows how profile and memory trace files can be opened in their client viewer tool (RMV or RGP)

The Radeon™ Developer Panel has been redesigned to provide an extensible user interface that can accommodate new tool features when they become available.

It provides two modes of operation, Basic and Advanced:

  • Basic mode provides a simple-to-use and streamlined user experience that can generate profile and memory data in as little as four mouse clicks. This mode can support most of your needs.
  • Advanced mode allows the customization of your workflow on a per-application basis and will help developers who work with multiple executables or have more complex application startup modes (e.g. game launchers). 

 

What’s new in RDP 2.1?

In previous versions of the panel, the Radeon™ Developer Panel (RDP) version 1.7 only supported Radeon™ GPU Profiler (RGP) functionality, and RDP version 2.0 only supported the Radeon™ Memory Visualizer (RMV). In addition, both panels had different UIs and workflows leading to a less than ideal user experience.

The good news is that RDP 2.1 unifies the RMV and RGP functionality from the earlier panels and provides a unified workflow experience. Also, RDP 2.1 is built on totally new developer driver code that provides better tools support, stability, and all around user experience.

Here is a summary of the main improvements in RDP 2.1:

  1. Now supports both RGP profiling and RMV memory tracing.
  2. Can generate RGP profiles and an RMV trace in the same app session.
  3. Simplified instruction tracing.
  4. RMV snapshot marker insertion.
  5. Introduces customizable and user created workflows.

For more information on all aspects of RDP 2.1, please read the full RDP help documentation.

Let’s look at the changes in more detail.

Getting started with RDP 2.1

The connection pane remains the same as in RDP 2.0 and once you have connected for the first time it will auto-connect on startup thereafter.

Image 4: The panel (disconnected) on the very first run

On connection with the Radeon™ driver, the panel switches to the System tab selected where it waits to auto-detect the startup of either a DirectX® 12, Vulkan®, or OpenCL™ application. 

Image 5: The panel connected to the developer driver waiting for an application to start

On detecting the startup of an application using one of the supported APIs, the panel switches to the Applications pane where you can see the name of the application you are analyzing and the features that are available for use. 

The executable name, graphics API and its process ID are displayed in the left-hand column and on the right-hand side are tabs, one for each feature – Profiling, Memory Trace, and Device Clocks.

You can switch between the tabs while your application is running, allowing you to generate profiles, a memory trace, and change clock rates in the same session.

Image 6: The Applications tab's main UI elements

Profiling your applications

To generate an RGP profile file click on the Capture profile button, it’s that easy!

The resulting profile is displayed in the Recently collected profiles list. By default, profiles are stored in the user’s documents location, in the “ rgp_profiles ” directory. A further sub-directory is created for each unique executable name and is used to store profiles from that application.

The full path location is displayed in the Profiling pane. Right-clicking on a profile in the list opens a popup menu that allows you to easily navigate to the profile’s file location. 

Image 7: Shows the right-click context menu used for additional file operations

Simpler instruction tracing

RDP 1.7 was limited to generating instruction timing data for one specific PSO at a time and the workflow required taking multiple profiles. With RDP 2.1 it is now possible to gather instruction timing data for all PSOs in a single profile (limited to a single shader engine). 

To use this feature, check the “Enable instruction tracing” option prior to capturing a profile. When a profile is captured, it will now contain detailed instruction timing data.

Image 8: Shows the Enable instruction trace checkbox as being set

Memory tracing your applications

You can switch between tool tabs while your application is running, allowing you to take profiles, switch to the Memory Trace tab, and generate an RMV memory trace.

Here we can see a live DirectX® 12 application. The process ID is displayed on the left.

To generate a memory trace click the Dump Trace button or shutdown your application. The deletion of the graphics device during shutdown will trigger the dumping of the current memory trace data.

Image 9: Shows the Memory Trace tab

An important thing to remember is that memory tracing begins when the application starts, and ends when you click on on the Dump Trace button, or when the application is closed.

While your application is running you may take many RGP profiles, but there can only be one RMV trace file.

Please note that after a trace is generated, the Dump Trace button is no longer active and it is not possible to create another during the current session.

Image 10: Shows the memory trace file added to the file list on completion of the trace

RMV Snapshot marker insertion

While your application is running, the Insert snapshot feature can be used to add snapshots into the memory trace.

For example, adding a snapshot before and after a game level loads can be used to detect a content memory leak (e.g. forgetting to unload content from GPU memory). Simply type a string in the text box and click insert. 

Image 11: Shows the user string “Snapshot 1” being added to the memory trace

On loading the memory trace in RMV, the snapshots will appear already populated in the Snapshot Timeline.

Image 12: Shows three snapshots that were added to the memory trace using the panel while the application was running

Device Clocks

The clock controls are contained in the Device Clocks tool tab, and to use them your target application must be running. If no application is running, then the two clock modes will not appear in the UI.

When active, the clocks UI allows you to switch between:

  • Default clock mode – fluctuating on demand.
  • Stable clock mode – fixed at a lower but stable clock rate.

Changing the clock mode has an immediate effect.

Please note that both Default and Stable modes are overridden when an RGP profile is taken. When profiling, the GPU is set to run at stable peak clock rate for several frames while the profile is being captured.

Image 13: Shows the Clocks UI with Default clock mode selected

Introducing Workflows

Workflows allow you to control which features are active during a session. In addition, the settings of each tool can be customized within a workflow.

The panel ships with three basic workflows:

  1. Default – Supports both profiling and memory tracing in the same session.
  2. Profiling – Supports only profiling in a session.
  3. Memory Trace – Support only memory tracing in a session.

In the System tab, the Workflow pull-down allows you to select which workflow suits your task the best. The workflow selection is sticky and will remain the next time you start up RDP 2.1. 

Most simple usage cases will only require the use of one of the three basic workflow options. The Workflow selection is global and will be applied to any application that you analyze.

Image 14: Shows the three workflows that ship with RDP 2.1

Why are there three workflows, surely the Default workflow covers all the bases?

Both profiling and memory tracing each add a small overhead in processing so reducing the active tools is desirable. If you only want to profile your application, then it is best to use just the Profile workflow.

If you have several builds of an application (e.g. one optimized for rendering, and one optimized for GPU memory footprint) you might find yourself switching the global workflow back and forth between Profiling and Memory Trace. Luckily, RDP 2.1 offers a solution to this.

What is Advanced Mode?

RDP 2.1 ships with Advanced Mode turned off. We call this Basic mode, and the selected workflow is applied to all applications being analyzed. 

With Advanced Mode turned on you can:

1)     Assign specific workflows to different executables. For example, assigning the Profiling workflow to an executable optimized for rendering, and assigning the Memory Trace workflow to an executable optimized for GPU memory footprint.

2)     Specify an application name and a desired API (DX12®, Vulkan®, or OpenCL™) to act as a process filter. This is useful if you have an executable that can render using either of the graphics APIs but you only want to analyze it when using one, for example Vulkan®. When the executable is running using DX12®, it will be ignored if you specify Vulkan® as the API.

3)     Customize existing workflows or create your own new ones!

Image 15: Shows three different applications, each using a different workflow

With Advanced Mode On, each executable can be assigned a different workflow. This avoids having to remember to switch the global mode when switching between different applications.

Another feature of Advanced Mode is that the panel will only work with the applications in the Executable name list, and all other applications will be ignored.

In contrast, when in Basic mode, the panel will work with any application that uses one of the supported APIs. This aspect of basic mode can lead to difficulties when using starting an application via a launcher that itself uses one of the supported APIs. In this case, the panel will detect and attach to the launcher executable first and ignore the application you wish to analyze. 

To permanently prevent an application, such as a launcher, from being acted upon by the panel, you can add the executable name to the Blocked applications list. The panel’s default list of blocked applications already contains many executables that are known to acquire the panel’s focus. Alternatively, you can use advanced mode to exclude all but the named executables from being acted upon by the panel. 

Customized Workflows

Why would I want to make my own workflow?

There are some cases where workflows can be very powerful. As an example, let’s use an OpenCL™ application and customize a workflow for it.

In the screenshot above (image 15), the OpenCL™ app MatrixMulDispatchCounter is using the Default workflow.

To edit a workflow:

  1. Go to the the “System” / “My workflows” UI shown in the screenshot here. The Default workflow will be automatically selected (highlighted in gray).
  2. In the Workflow Settings pane below, click on the OpenCL tab to reveal the settings specific to that API.  

The default settings show that “Enable auto capture” is disabled. This means the user will need to click on the Capture profile button to generate a profile. The profile will also run until 50 dispatches are counted after the user clicks the Profile button.

Image 16: Shows the workflow editor being used to change the Profiler settings in the Default workflow

However, it is reasonable to assume that the user might have an application that requires auto-capture of just a single dispatch. In this situation we could just edit this workflow’s settings, or alternatively we can create a new workflow specifically for this application.

At the top of the UI is a text entry box where you can enter the name of your new workflow and an “Add” button to create it. 

The example here (image 17) shows a new workflow called “SingleDispatch_OCL” that has the Profiler enabled and the other tools disabled. The OpenCL™ settings have been set to use auto capture to profile the first dispatch.

Image 17: A new workflow has been created in the workflow editor

Switching to the My applications menu (with Advanced Mode On), we can now assign this new workflow to the MatrixMulDispatchCounter executable.

Now, as soon as the application starts up the panel will auto-capture a profile of the first dispatch. 

Image 18: Shows the user created workflow "SingleDispatch_OCL" assigned to MatrixMulDispatchCounter executable

In the screenshot here (image 19), we can see that the new profile is smaller in file size than an earlier one taken with 50 dispatches, reflecting the fact it only contains data from a single dispatch. 

Image 19: Two profiles have been captured, each using a different workflow (Default and SingleDispatch_OCL)

We can confirm the change by opening the two profiles in RGP as shown below: (image 20 and image 21).

Image 20: Shows a 50 dispatch profile initiated manually
Image 21: Shows a single dispatch profile triggered automatically on startup

Workflows will become even more useful as we add more features to the Radeon™ Developer Panel in the future.

That’s it for now!

We hope you find RDP 2.1 useful in your development, and don’t hesitate to send us feedback on how you think it can be improved. Please provide feedback on RDP here

Once again, you can find the RDP 2.1 help documentation here:

| RELATED TOOLS

Radeon™ Memory Visualizer

Radeon™ Memory Visualizer (RMV) is a tool to allow you to gain a deep understanding of how your application uses memory for graphics resources.

Radeon™ GPU Profiler

RGP gives you unprecedented, in-depth access to a GPU. Easily analyze graphics, async compute usage, event timing, pipeline stalls, barriers, bottlenecks, and other performance inefficiencies.

| YOU MAY ALSO LIKE...

Documentation

Some light reading to take away with you. Our ISAs, manuals, whitepapers, and many more.

Learn

Explore our huge collection of detailed tutorials and sample code to find your answers to your graphics development questions.

Samples Library

Browse all our useful samples. Perfect for when you’re needing to get started, want to integrate one of our libraries, and much more.

Tutorials Library

Browse all our fantastic tutorials, including programming techniques, performance improvements, guest blogs, and how to use our tools.