The AMD GPU Performance API (GPUPerfAPI, or GPA) is a powerful library, providing access to GPU Performance Counters. It can help analyze the performance and execution characteristics of applications using a Radeon™ GPU.
GPUPerfAPI is used by AMD Radeon GPU Profiler, as well as several third-party tools including Microsoft PIX on Windows and RenderDoc.
Download the latest version - v3.17
This release is a stability release that adds the following fixes:
- OpenCL support has been re-enabled for AMD Radeon RX 7000 series hardware.
- OpenGL: GPA is no longer supporting Adrenalin 19.6.3 and older drivers.
- On all hardware and APIs, the following counters were renamed for clarity:
-
CSWavefronts
was renamed toCSWavefrontsLaunched
-
CSThreads
was renamed toCSThreadsLaunched
-
CSThreadGroups
was renamed toCSThreadGroupsLaunched
-
- On all hardware and APIs the following counters were removed, there are already matching counters in the GlobalMemory group:
-
CSMemUnitBusy
,CSMemUnitBusyCycles
,CSMemUnitStalled
,CSMemUnitStalledCycles
,CSWriteUnitStalled
,CSWriteUnitStalledCycles
-
-
CSALUStalledByLDS
andCSALUStalledByLDSCycles
are now based on per-wave cycle counts. - On AMD Radeon RX 5000 Series and newer hardware, counters in the ComputeShader group now have simplified equations.
Benefits
- Provides a standard API for accessing GPU Performance counters for both graphics and compute workloads across multiple GPU APIs.
- Supports Vulkan™, DirectX® 12, DirectX® 11, OpenGL™ and OpenCL™.
- Supports all recent GCN™ & RDNA™-based Radeon graphics cards and APUs based on Graphics IP version 8 and newer.
- Supports both Windows® and Linux.
- Provides derived “public” counters based on raw HW counters.
- Provides access to some raw hardware counters. See Raw Hardware Counters for more information.
Find out more
RDNA 3: Read about our tool updates in Radeon Developer Tool Suite (RDTS)
Read this high level summary of our updates to RDTS for RDNA™ 3, including other new features and improvements, plus updates to GPUPerfAPI.
GPUPerfAPI v3.7 includes Radeon™ RX 6000 support and new raytracing counters
GPUPerfAPI v3.7 brings support for Radeon™ RX 6000 series GPUs, new raytracing counters for DirectX® Raytracing, a new scalar and instruction cache counter, and new raytracing High-Frequency counters in Microsoft® PIX2.
Requirements
Supported GPUs
- Radeon™ RX 7000 series
- Radeon™ RX 6000 series
- Radeon™ RX 5500 series and RX 5300 series
- Radeon™ RX 5700 and RX 5700 XT
- Radeon™ VII
- Radeon™ RX Vega
- Ryzen™ 7000 Series with Radeon™ 700M Series Graphics
- Ryzen™ RX 4600H with Radeon™ Vega Graphics
- Ryzen™ 5 2400G and Ryzen™ 3 2200G Processors with Radeon™ Vega Graphics
- Radeon™ R9 Fury, Fury X and Fury Nano
- Radeon™ RX 400 and RX 500
- Tonga R9 285, R9 380
Supported graphics APIs
- DirectX® 12
- Vulkan®
- DirectX® 11
- OpenGL®
Supported compute APIs
- OpenCL™ (on Windows)
Supported OSs
- Windows® 10
- Windows® 11
- Linux® – Ubuntu 18.04 LTS
- Linux® – Ubuntu 20.04 LTS
- Linux® – Ubuntu 22.04 LTS
Version history
This release is a stability release that adds the following fixes:
- OpenCL support has been re-enabled for AMD Radeon RX 7000 series hardware.
- OpenGL: GPA is no longer supporting Adrenalin 19.6.3 and older drivers.
- On all hardware and APIs, the following counters were renamed for clarity:
-
CSWavefronts
was renamed toCSWavefrontsLaunched
-
CSThreads
was renamed toCSThreadsLaunched
-
CSThreadGroups
was renamed toCSThreadGroupsLaunched
-
- On all hardware and APIs the following counters were removed, there are already matching counters in the GlobalMemory group:
-
CSMemUnitBusy
,CSMemUnitBusyCycles
,CSMemUnitStalled
,CSMemUnitStalledCycles
,CSWriteUnitStalled
,CSWriteUnitStalledCycles
-
-
CSALUStalledByLDS
andCSALUStalledByLDSCycles
are now based on per-wave cycle counts. - On AMD Radeon RX 5000 Series and newer hardware, counters in the ComputeShader group now have simplified equations.
- Added support for additional AMD RDNA™ 3 based APUs.
- GPA’s OpenCL™ support has been temporarily disabled on AMD RDNA 3 hardware.
- Updated error checking in counter splitting to report error if counter group max is zero.
- Disabled the following counters on AMD RDNA 3 based hardware due to inconsistent results:
-
CBMemRead
,CBColorAndMaskRead
,CBMemWritten
,CBColorAndMaskWritten
-
- Disabled the following counters on AMD RDNA™ 2 based hardware due to inconsistent results:
-
VsGsVerticesIn
,VsGsPrimsIn
-
- Disabled the following counters on AMD RDNA™ based hardware due to inconsistent results:
-
VsGsSALUBusy
,VsGsSALUBusyCycles
,VsGsVALUBusy
,VsGsVALUBusyCycles
,VsGsVALUInstCount
,VsGsSALUInstCount
,PSVALUBusy
,PSVALUBusyCycles
,PSVALUInstCount
,PSSALUBusy
,PSSALUBusyCycles
,PSSALUInstCount
-
- Output from pre_build.py script is now generated into
build\
|win,linux|\
directory. - Compiled binaries are now generated into
build\output\
directory.
- Updated equation for
MemUnitBusyCycles
. - Updated description of
LocalVidMemBytes
. - Reduced size of static buffer when logging messages to avoid compiler warning.
- Fixed an issue on some variant hardware that would prevent enabling certain hardware counters.
- Added support for AMD Radeon RX 7700 XT and AMD Radeon RX 7800 XT graphics cards.
- Added support for additional AMD Radeon 700M Series devices.
- Improved support for multi-GPU systems.
- Added counters back to Gfx9, Gfx10, Gfx103, and Gfx11 hardware generations. These restored counters are listed below by group:
- Timing:
- TessellatorBusy, TessellatorBusyCycles
- VsGsBusy, VsGsBusyCycles, VsGsTime
- PreTessellationBusy, PreTessellationBusyCycles, PreTessellationTime
- PostTessellationBusy, PostTessellationBusyCycles, PostTessellationTime
- VertexGeometry:
- VsGsVerticesIn, VsGsPrimsIn, GSVerticesOut
- PreTessellation:
- PreTessVerticesIn
- PostTessellation:
- PostTessPrimsOut
- PrimitiveAssembly:
- PrimitivesIn
- TextureUnit:
- TexTriFilteringPct, TexTriFilteringCount, NoTexTriFilteringCount
- TexVolFilteringPct, TexVolFilteringCount, NoTexVolFilteringCount
- Timing:
- New counters added:
- MemoryCache:
- L0TagConflictReadStalledCycles, L0TagConflictWriteStalledCycles, L0TagConflictAtomicStalledCycles
- MemoryCache:
- Add support for AMD Radeon RX 7000M series hardware.
- Add support for AMD Radeon RX 7000S series hardware.
- OpenCL support for AMD Radeon RX 7000 series hardware has been restored if using Adrenalin 23.3.2 or newer.
- Code has been updated to C++17 language standard.
- Fixed a regression that resulted in a crash on certain hardware variants.
- Add support for Radeon™ RX 7900 XTX and 7900 XT GPUs.
- GPA binary sizes have been reduced by approximately 75%.
- Update PreTessellation and PostTessellation counters to report results only when tessellation is in use.
- Updated to support the Adrenalin 22.7.1 driver.
- Added L2CacheHit counter to OpenGL for parity with other APIs on Radeon RX 5000 Series hardware.
- Add support for additional GPUs and APUs.
- Add support for raytracing counters in Vulkan on RDNA2 (Radeon RX 6000 Series) hardware:
- RayTriTests, and RayBoxTests: These counters collect the number of ray intersections for triangles and boxes, respectively.
- TotalRayTests: This counter collects the aggregated number of ray-box and ray-triangle intersection tests.
- RayTestsPerWave: This counter collects ray intersection test count at a more granular level – per wave.
- Add support for additional GPUs and APUs, including AMD Radeon™ RX 6300, 6400, and 6500 series GPUs.
- Redefined derived counters on GCN™ (Vega), RDNA™, and RDNA™ 2 hardware.
- New entrypoint added: GpaGetDeviceGeneration.
- Add support for
GPA_OVERRIDE_LOG_LEVEL
environment variable to increase or decrease logging output. - Fixed driver version detection in OpenGL™ and DirectX® 11.
- Extensive counter validation in DirectX® 12.
- Improvements made to sample applications.
Add support for additional GPUs and APUs, including AMD Radeon™ RX 6600 series GPUs.
- Add support for additional GPUs and APUs, including AMD Radeon™ RX 6700 series GPUs.
- Code has been updated to adhere to Google C++ Style Guide.
- New public headers have been added.
- Old headers are deprecated and will emit compile-time message.
- Projects loading GPA will need to be recompiled, but no code changes are required unless moving to the new headers.
- Improvements made to sample applications.
- Updated documentation for new codestyle (and https://github.com/GPUOpen-Tools/gpu_performance_api/issues/56)
- Add support for additional GPUs and APUs, including AMD RDNA™ 2 Radeon™ RX 6000 series GPUs.
- New RT counters for DXR workloads on AMD RDNA™ 2 Radeon™ RX 6000 series GPUs:
- RayTriTests, and RayBoxTests: These counters collect the number of ray intersections for triangles and boxes, respectively.
- TotalRayTests: This counter collects the aggregated number of ray-box and ray-triangle intersection tests.
- RayTestsPerWave: This counter collects ray intersection test count at a more granular level – per wave.
- New Scalar and Instruction cache counters on AMD RDNA™ Radeon™ RX 5000 series GPUs:
- Scalar cache: ScalarCacheHit, ScalarCacheRequestCount, ScalarCacheHitCount, ScalarCacheMissCount.
- Instruction cache: InstCacheHit, InstCacheRequestCount, InstCacheHitCount, InstCacheMissCount.
- Update the Vulkan® sample to remove the static link and use the system-specific Vulkan® loader.
- Remove OpenCL™ support on Linux®.
- Remove downloading the Vulkan® SDK by the build script.
- Add support for additional GPUs and APUs, including AMD Ryzen™ 4000 Series APUs.
- Add two new GFX10 GlobalMemory Counters for graphics using DX12 and Vulkan®:
LocalVidMemBytes
andPcieBytes
. - Add VS2019 project support to CMake.
- Restructure of GPA source layout to adhere to Google style.
- Add support for additional GPUs and APUs, including Radeon™ 5500 and Radeon™ 5300 Series GPUs.
- Add DirectX®11 sample application using GPUPerfAPI.
- Add per-API static counter generation.
- Decrease in GPUPerfAPI binaries size.
- Add script to package GPUPerfAPI post-build.
- Remove ROCm/HSA support.
- Add Unicode support in GPUPerfAPI for Linux.
- Bugs Fixed:
- Fixed CMake files to respect supported build flags.
- Fixed crash when DX12 debug layer was enabled.
- Fixed an issue with loading of shader in GPA Vulkan® sample app.
- Fixed an issue in Vulkan® build with newer Vulkan® SDK with amd_shader_core_properties2 extension
- Fixed an issue with crash on unsupported Gfx6 and Gfx7 GPUs.
- Add support for additional GPUs and APUs, including Radeon 5700 Series GPUs.
- Add support for setting stable GPU clocks for DirectX11, OpenGL and OpenCL.
- Add an OpenGL sample application that uses GPUPerfAPI.
- Add basic counter validation to sample applications.
- Add support for enabling individual hardware counters that make up derived counters.
- Add two new GFX9 GlobalMemory Counters for graphics:
LocalVidMemBytes
andPcieBytes
. - Reformat source code using clang-format.
- Update counter documentation to contain per-hardware-generation tables.
- Bugs Fixed:
- Fixed error handling in
GPA_GetEnabledIndex
,GPA_EnableCounterByName
, andGPA_DisbleCounterByName
. - Fixed an issue with Vulkan timing counters (https://github.com/GPUOpen-Tools/GPA/issues/40).
- Fixed an issue with
SALUBusy
counters. - Fixed an issue with
HiZQuadsCulledCount
andHiZQuadsSurvivingCount
counters on GFX8 GPUs. - Fixed an issue with
MemUnitBusy
andMemUnitStalled
counters on GFX8 GPUs. - Fixed an issue with
VSVALUBusyCycles
counter on GFX9 GPUs.
- Fixed error handling in
- Add support for additional GPUs and APUs.
- New CMake-based build system.
- Support building on Ubuntu 18.04.
- ROCm/HSA: uses new
librocprofiler64.so
rather than deprecatedlibhsa-runtime-tools64.so
library for performance counter collection. - Timing-based counters are now reported in nanoseconds instead of milliseconds.
- New timing counter to report top-of-pipe to bottom-of-pipe duration.
- GPA now builds GoogleTest libraries on the fly rather than using prebuilt binaries.
- Add support for additional GPUs and APUs.
- Wrapped all GPA entrypoints in try/catch to ensure unhandled exceptions do not escape the GPA library.
- Add VS2017 project files.
- Bugs Fixed:
- Fixed https://github.com/GPUOpen-Tools/GPA/issues/18.
- Fixed support for scheduling counters on multiple sessions.
- OpenGL: Fixed a bug in GPASample cleanup.
- Add support for additional GPUs and APUs.
- Usability improvements to
GPAInterfaceLoader.h
. - New Vulkan and DirectX 12 sample applications.
- New
GPA_GetSampleId
entry point. - New
GPA_GetVersion
entry point. - Bugs Fixed:
- Fixed issues with some counters on 56CU Vega10.
- Vulkan: Fixed
GPA_ContinueSampleOnCommandList
. - Vulkan: Ensure results are ready before trying to query them.
- DirectX 12: Fixed incorrect device reference counting issue.
- Add support for additional GPUs and APUs.
- Support for collecting hardware counters for Vulkan and DirectX 12 applications.
- Redesigned API to support modern graphics APIs.
- The documentation has been rewritten and is now available in HTML format.
- New counters added:
- Cycle and count-based counters in addition to existing percentage-based counters.
- New Depth Buffer memory read/write counters.
- Additional Color Buffer memory counters.
- For graphics, several global memory counters which were previously available only in the Compute Shader stage are now available generically.
- Support for setting stable GPU clocks.
- Counter Group Names can now be queried separately from Counter Descriptions.
- Counters now have a UUID which can be used to uniquely identify a counter.
- New entry point (
GPA_GetFuncTable
) to retrieve a table of function pointers for all GPA entry points. - New C++
GPAInterfaceLoader.h
header file provides an easy way to load and use GPA entry points. - Bugs Fixed:
- Fixed an issue with
TesselatorBusy
counter on many GFX8 GPUs. - Fixed an issue with
FlatVMemInsts
andCSFlatVMemInsts
counters on many GFX8 GPUs. - Fixed an issue with
LDSInsts
counter on Vega GPUs. - Fixed some issues with Compute Shader counters on Vega GPUs.
- Some counter combinations could lead to incorrect counter results.
- Enabling counters in a certain order can lead to incorrect counter scheduling across multiple passes.
- ROCm/HSA:
GPA_OpenContext
crashes iflibhsa-runtime64.so.1
can’t be found. - ROCm/HSA: GPA does not coexist nicely with an application that also sets the
HSA_TOOLS_LIB
environment variable. - OpenGL: Fixed a crash that can occur with an incorrectly-configured OpenGL driver.
- OpenGL: Fixed some issues with OpenGL device-detection.
- Fixed an issue with
Our other SDKs
AMD Radeon™ Anti-Lag 2 reduces the system latency by applying frame alignment between the CPU and GPU jobs.
Capsaicin is a Direct3D12 framework for real-time graphics research which implements the GI-1.0 technique and a reference path-tracer.
The Render Pipeline Shaders (RPS) SDK provides a framework for graphics engines to use Render Graphs with explicit APIs.
ADLX is a modern library designed to access features and functionality of AMD systems such as Display, 3D graphics, Performance Monitoring, GPU Tuning, and more.
Brotli-G is an open-source compression/decompression standard for digital assets (based on Brotli) that is compatible with GPU hardware.
HIP RT is a ray tracing library for HIP, making it easy to write ray tracing applications in HIP.
Orochi is a library which loads HIP and CUDA® APIs dynamically, allowing the user to switch APIs at runtime.
AMD Radeon™ ProRender is our fast, easy, and incredible physically-based rendering engine built on industry standards that enables accelerated rendering on virtually any GPU, any CPU, and any OS in over a dozen leading digital content creation and CAD applications.
Radeon™ Machine Learning (Radeon™ ML or RML) is an AMD SDK for high-performance deep learning inference on GPUs.
Harness the power of machine learning to enhance images with denoising, enabling your application to produce high quality images in a fraction of the time traditional denoising filters take.
The Advanced Media Framework SDK provides developers with optimal access to AMD GPUs for multimedia processing.
The D3D12 Memory Allocator (D3D12MA) is a C++ library that provides a simple and easy-to-integrate API to help you allocate memory for DirectX®12 buffers and textures.
The AMD Display Library (ADL) SDK is designed to access display driver functionality for AMD Radeon™ and AMD FirePro™ graphics cards.
The AMD GPU Services (AGS) library provides software developers with the ability to query AMD GPU software and hardware state information that is not normally available through standard operating systems or graphics APIs.
VMA is our single-header, MIT-licensed, C++ library for easily and efficiently managing memory allocation for your Vulkan® games and applications.
AMD TrueAudio Next is a software development kit for GPU accelerated and multi-core high-performance audio signal processing.
AMD Radeon™ ProRender SDK is a powerful physically-based path traced rendering engine that enables creative professionals to produce stunningly photorealistic images.
The lightweight accelerated ray intersection library for DirectX®12 and Vulkan®.
Compressonator is a set of tools to allow artists and developers to more easily work with compressed assets and easily visualize the quality impact of various compression technologies.
LiquidVR™ provides a Direct3D 11 based interface for applications to get access to the following GPU features regardless of whether a VR device is installed on a system.