AMD GPU Performance API

The AMD GPU Performance API (GPUPerfAPI, or GPA) is a powerful library, providing access to GPU Performance Counters. It can help analyze the performance and execution characteristics of applications using a Radeon™ GPU.

GPUPerfAPI is used by AMD Radeon GPU Profiler, as well as several third-party tools including Microsoft PIX on Windows and RenderDoc.

Download the latest version - v3.17

This release is a stability release that adds the following fixes:

  • OpenCL support has been re-enabled for AMD Radeon RX 7000 series hardware.
  • OpenGL: GPA is no longer supporting Adrenalin 19.6.3 and older drivers.
  • On all hardware and APIs, the following counters were renamed for clarity:
    • CSWavefronts was renamed to CSWavefrontsLaunched
    • CSThreads was renamed to CSThreadsLaunched
    • CSThreadGroups was renamed to CSThreadGroupsLaunched
  • On all hardware and APIs the following counters were removed, there are already matching counters in the GlobalMemory group:
    • CSMemUnitBusy , CSMemUnitBusyCycles , CSMemUnitStalled , CSMemUnitStalledCycles , CSWriteUnitStalled , CSWriteUnitStalledCycles
  • CSALUStalledByLDS and CSALUStalledByLDSCycles are now based on per-wave cycle counts.
  • On AMD Radeon RX 5000 Series and newer hardware, counters in the ComputeShader group now have simplified equations.

Benefits

Find out more

Requirements

Supported GPUs

  • Radeon™ RX 7000 series
  • Radeon™ RX 6000 series
  • Radeon™ RX 5500 series and RX 5300 series
  • Radeon™ RX 5700 and RX 5700 XT
  • Radeon™ VII
  • Radeon™ RX Vega
  • Ryzen™ 7000 Series with Radeon™ 700M Series Graphics
  • Ryzen™ RX 4600H with Radeon™ Vega Graphics
  • Ryzen™ 5 2400G and Ryzen™ 3 2200G Processors with Radeon™ Vega Graphics
  • Radeon™ R9 Fury, Fury X and Fury Nano
  • Radeon™ RX 400 and RX 500
  • Tonga R9 285, R9 380

Supported graphics APIs

  • DirectX® 12
  • Vulkan®
  • DirectX® 11
  • OpenGL®

Supported compute APIs

  • OpenCL™ (on Windows)

Supported OSs

  • Windows® 10
  • Windows® 11
  • Linux® – Ubuntu 18.04 LTS
  • Linux® – Ubuntu 20.04 LTS
  • Linux® – Ubuntu 22.04 LTS

Version history

This release is a stability release that adds the following fixes:

  • OpenCL support has been re-enabled for AMD Radeon RX 7000 series hardware.
  • OpenGL: GPA is no longer supporting Adrenalin 19.6.3 and older drivers.
  • On all hardware and APIs, the following counters were renamed for clarity:
    • CSWavefronts was renamed to CSWavefrontsLaunched
    • CSThreads was renamed to CSThreadsLaunched
    • CSThreadGroups was renamed to CSThreadGroupsLaunched
  • On all hardware and APIs the following counters were removed, there are already matching counters in the GlobalMemory group:
    • CSMemUnitBusy , CSMemUnitBusyCycles , CSMemUnitStalled , CSMemUnitStalledCycles , CSWriteUnitStalled , CSWriteUnitStalledCycles
  • CSALUStalledByLDS and CSALUStalledByLDSCycles are now based on per-wave cycle counts.
  • On AMD Radeon RX 5000 Series and newer hardware, counters in the ComputeShader group now have simplified equations.
  • Added support for additional AMD RDNA™ 3 based APUs.
  • GPA’s OpenCL™ support has been temporarily disabled on AMD RDNA 3 hardware.
  • Updated error checking in counter splitting to report error if counter group max is zero.
  • Disabled the following counters on AMD RDNA 3 based hardware due to inconsistent results:
    • CBMemRead , CBColorAndMaskRead , CBMemWritten , CBColorAndMaskWritten
  • Disabled the following counters on AMD RDNA™ 2 based hardware due to inconsistent results:
    • VsGsVerticesIn , VsGsPrimsIn
  • Disabled the following counters on AMD RDNA™ based hardware due to inconsistent results:
    • VsGsSALUBusy , VsGsSALUBusyCycles , VsGsVALUBusy , VsGsVALUBusyCycles , VsGsVALUInstCount , VsGsSALUInstCount , PSVALUBusy , PSVALUBusyCycles , PSVALUInstCount , PSSALUBusy , PSSALUBusyCycles , PSSALUInstCount
  • Output from pre_build.py script is now generated into build\ |win,linux| \ directory.
  • Compiled binaries are now generated into build\output\ directory.
  • Updated equation for MemUnitBusyCycles .
  • Updated description of LocalVidMemBytes .
  • Reduced size of static buffer when logging messages to avoid compiler warning.
  • Fixed an issue on some variant hardware that would prevent enabling certain hardware counters.
  • Added support for AMD Radeon RX 7700 XT and AMD Radeon RX 7800 XT graphics cards.
  • Added support for additional AMD Radeon 700M Series devices.
  • Improved support for multi-GPU systems.
  • Added counters back to Gfx9, Gfx10, Gfx103, and Gfx11 hardware generations. These restored counters are listed below by group:
    • Timing: 
      • TessellatorBusy, TessellatorBusyCycles 
      • VsGsBusy, VsGsBusyCycles, VsGsTime 
      • PreTessellationBusy, PreTessellationBusyCycles, PreTessellationTime
      • PostTessellationBusy, PostTessellationBusyCycles, PostTessellationTime
    • VertexGeometry: 
      • VsGsVerticesInVsGsPrimsInGSVerticesOut
    • PreTessellation: 
      • PreTessVerticesIn
    • PostTessellation: 
      • PostTessPrimsOut
    • PrimitiveAssembly: 
      • PrimitivesIn
    • TextureUnit: 
      • TexTriFilteringPctTexTriFilteringCountNoTexTriFilteringCount
      • TexVolFilteringPctTexVolFilteringCountNoTexVolFilteringCount
  • New counters added:
    • MemoryCache: 
      • L0TagConflictReadStalledCyclesL0TagConflictWriteStalledCyclesL0TagConflictAtomicStalledCycles
  • Add support for AMD Radeon RX 7000M series hardware.
  • Add support for AMD Radeon RX 7000S series hardware.
  • OpenCL support for AMD Radeon RX 7000 series hardware has been restored if using Adrenalin 23.3.2 or newer.
  • Code has been updated to C++17 language standard.
  • Fixed a regression that resulted in a crash on certain hardware variants.
  • Add support for Radeon™ RX 7900 XTX and 7900 XT GPUs.
  • GPA binary sizes have been reduced by approximately 75%.
  • Update PreTessellation and PostTessellation counters to report results only when tessellation is in use.
  • Updated to support the Adrenalin 22.7.1 driver.
  • Added L2CacheHit counter to OpenGL for parity with other APIs on Radeon RX 5000 Series hardware.
  • Add support for additional GPUs and APUs.
  • Add support for raytracing counters in Vulkan on RDNA2 (Radeon RX 6000 Series) hardware:
    • RayTriTests, and RayBoxTests: These counters collect the number of ray intersections for triangles and boxes, respectively.
    • TotalRayTests: This counter collects the aggregated number of ray-box and ray-triangle intersection tests.
    • RayTestsPerWave: This counter collects ray intersection test count at a more granular level – per wave.
  • Add support for additional GPUs and APUs, including AMD Radeon™ RX 6300, 6400, and 6500 series GPUs.
  • Redefined derived counters on GCN™ (Vega), RDNA™, and RDNA™ 2 hardware.
  • New entrypoint added: GpaGetDeviceGeneration.
  • Add support for GPA_OVERRIDE_LOG_LEVEL environment variable to increase or decrease logging output.
  • Fixed driver version detection in OpenGL™ and DirectX® 11.
  • Extensive counter validation in DirectX® 12.
  • Improvements made to sample applications.

Add support for additional GPUs and APUs, including AMD Radeon™ RX 6600 series GPUs.

  • Add support for additional GPUs and APUs, including AMD Radeon™ RX 6700 series GPUs.
  • Code has been updated to adhere to Google C++ Style Guide.
    • New public headers have been added.
    • Old headers are deprecated and will emit compile-time message.
    • Projects loading GPA will need to be recompiled, but no code changes are required unless moving to the new headers.
  • Improvements made to sample applications.
  • Updated documentation for new codestyle (and https://github.com/GPUOpen-Tools/gpu_performance_api/issues/56)
  • Add support for additional GPUs and APUs, including AMD RDNA™ 2 Radeon™ RX 6000 series GPUs.
  • New RT counters for DXR workloads on AMD RDNA™ 2 Radeon™ RX 6000 series GPUs:
    • RayTriTests, and RayBoxTests: These counters collect the number of ray intersections for triangles and boxes, respectively.
    • TotalRayTests: This counter collects the aggregated number of ray-box and ray-triangle intersection tests.
    • RayTestsPerWave: This counter collects ray intersection test count at a more granular level – per wave.
  • New Scalar and Instruction cache counters on AMD RDNA™ Radeon™ RX 5000 series GPUs:
    • Scalar cache: ScalarCacheHitScalarCacheRequestCountScalarCacheHitCountScalarCacheMissCount.
    • Instruction cache: InstCacheHitInstCacheRequestCountInstCacheHitCountInstCacheMissCount.
  • Update the Vulkan® sample to remove the static link and use the system-specific Vulkan® loader.
  • Remove OpenCL™ support on Linux®.
  • Remove downloading the Vulkan® SDK by the build script.
  • Add support for additional GPUs and APUs, including AMD Ryzen™ 4000 Series APUs.
  • Add two new GFX10 GlobalMemory Counters for graphics using DX12 and Vulkan®: LocalVidMemBytes and PcieBytes .
  • Add VS2019 project support to CMake.
  • Restructure of GPA source layout to adhere to Google style.
  • Add support for additional GPUs and APUs, including Radeon™ 5500 and Radeon™ 5300 Series GPUs.
  • Add DirectX®11 sample application using GPUPerfAPI.
  • Add per-API static counter generation.
  • Decrease in GPUPerfAPI binaries size.
  • Add script to package GPUPerfAPI post-build.
  • Remove ROCm/HSA support.
  • Add Unicode support in GPUPerfAPI for Linux.
  • Bugs Fixed:
    • Fixed CMake files to respect supported build flags.
    • Fixed crash when DX12 debug layer was enabled.
    • Fixed an issue with loading of shader in GPA Vulkan® sample app.
    • Fixed an issue in Vulkan® build with newer Vulkan® SDK with amd_shader_core_properties2 extension
    • Fixed an issue with crash on unsupported Gfx6 and Gfx7 GPUs.
  • Add support for additional GPUs and APUs, including Radeon 5700 Series GPUs.
  • Add support for setting stable GPU clocks for DirectX11, OpenGL and OpenCL.
  • Add an OpenGL sample application that uses GPUPerfAPI.
  • Add basic counter validation to sample applications.
  • Add support for enabling individual hardware counters that make up derived counters.
  • Add two new GFX9 GlobalMemory Counters for graphics: LocalVidMemBytes and PcieBytes .
  • Reformat source code using clang-format.
  • Update counter documentation to contain per-hardware-generation tables.
  • Bugs Fixed:
    • Fixed error handling in GPA_GetEnabledIndex , GPA_EnableCounterByName , and GPA_DisbleCounterByName .
    • Fixed an issue with Vulkan timing counters (https://github.com/GPUOpen-Tools/GPA/issues/40).
    • Fixed an issue with SALUBusy counters.
    • Fixed an issue with HiZQuadsCulledCount and HiZQuadsSurvivingCount counters on GFX8 GPUs.
    • Fixed an issue with MemUnitBusy and MemUnitStalled counters on GFX8 GPUs.
    • Fixed an issue with VSVALUBusyCycles counter on GFX9 GPUs.
  • Add support for additional GPUs and APUs.
  • New CMake-based build system.
  • Support building on Ubuntu 18.04.
  • ROCm/HSA: uses new librocprofiler64.so rather than deprecated libhsa-runtime-tools64.so library for performance counter collection.
  • Timing-based counters are now reported in nanoseconds instead of milliseconds.
  • New timing counter to report top-of-pipe to bottom-of-pipe duration.
  • GPA now builds GoogleTest libraries on the fly rather than using prebuilt binaries.
  • Add support for additional GPUs and APUs.
  • Wrapped all GPA entrypoints in try/catch to ensure unhandled exceptions do not escape the GPA library.
  • Add VS2017 project files.
  • Bugs Fixed:
  • Add support for additional GPUs and APUs.
  • Usability improvements to GPAInterfaceLoader.h .
  • New Vulkan and DirectX 12 sample applications.
  • New GPA_GetSampleId entry point.
  • New GPA_GetVersion entry point.
  • Bugs Fixed:
    • Fixed issues with some counters on 56CU Vega10.
    • Vulkan: Fixed GPA_ContinueSampleOnCommandList .
    • Vulkan: Ensure results are ready before trying to query them.
    • DirectX 12: Fixed incorrect device reference counting issue.
  • Add support for additional GPUs and APUs.
  • Support for collecting hardware counters for Vulkan and DirectX 12 applications.
  • Redesigned API to support modern graphics APIs.
  • The documentation has been rewritten and is now available in HTML format.
  • New counters added:
    • Cycle and count-based counters in addition to existing percentage-based counters.
    • New Depth Buffer memory read/write counters.
    • Additional Color Buffer memory counters.
    • For graphics, several global memory counters which were previously available only in the Compute Shader stage are now available generically.
  • Support for setting stable GPU clocks.
  • Counter Group Names can now be queried separately from Counter Descriptions.
  • Counters now have a UUID which can be used to uniquely identify a counter.
  • New entry point ( GPA_GetFuncTable ) to retrieve a table of function pointers for all GPA entry points.
  • New C++ GPAInterfaceLoader.h header file provides an easy way to load and use GPA entry points.
  • Bugs Fixed:
    • Fixed an issue with TesselatorBusy counter on many GFX8 GPUs.
    • Fixed an issue with FlatVMemInsts and CSFlatVMemInsts counters on many GFX8 GPUs.
    • Fixed an issue with LDSInsts counter on Vega GPUs.
    • Fixed some issues with Compute Shader counters on Vega GPUs.
    • Some counter combinations could lead to incorrect counter results.
    • Enabling counters in a certain order can lead to incorrect counter scheduling across multiple passes.
    • ROCm/HSA: GPA_OpenContext crashes if libhsa-runtime64.so.1 can’t be found.
    • ROCm/HSA: GPA does not coexist nicely with an application that also sets the HSA_TOOLS_LIB environment variable.
    • OpenGL: Fixed a crash that can occur with an incorrectly-configured OpenGL driver.
    • OpenGL: Fixed some issues with OpenGL device-detection.

Our other SDKs

Anti-Lag 2 SDK

AMD Radeon™ Anti-Lag 2 reduces the system latency by applying frame alignment between the CPU and GPU jobs.

AMD Capsaicin Framework

Capsaicin is a Direct3D12 framework for real-time graphics research which implements the GI-1.0 technique and a reference path-tracer.

AMD Render Pipeline Shaders (RPS) SDK

The Render Pipeline Shaders (RPS) SDK provides a framework for graphics engines to use Render Graphs with explicit APIs.

AMD Device Library eXtra

ADLX is a modern library designed to access features and functionality of AMD systems such as Display, 3D graphics, Performance Monitoring, GPU Tuning, and more.

AMD Brotli-G SDK

Brotli-G is an open-source compression/decompression standard for digital assets (based on Brotli) that is compatible with GPU hardware.

AMD HIP Ray Tracing

HIP RT is a ray tracing library for HIP, making it easy to write ray tracing applications in HIP.

AMD Orochi

Orochi is a library which loads HIP and CUDA® APIs dynamically, allowing the user to switch APIs at runtime.

AMD Radeon ProRender Developer Suite

AMD Radeon™ ProRender is our fast, easy, and incredible physically-based rendering engine built on industry standards that enables accelerated rendering on virtually any GPU, any CPU, and any OS in over a dozen leading digital content creation and CAD applications.

AMD Radeon ML

Radeon™ Machine Learning (Radeon™ ML or RML) is an AMD SDK for high-performance deep learning inference on GPUs.

AMD Radeon Image Filter

Harness the power of machine learning to enhance images with denoising, enabling your application to produce high quality images in a fraction of the time traditional denoising filters take.

AMD Advanced Media Framework

The Advanced Media Framework SDK provides developers with optimal access to AMD GPUs for multimedia processing.

AMD GPUOpen Direct3D12 Memory Allocator (D3D12MA)

The D3D12 Memory Allocator (D3D12MA) is a C++ library that provides a simple and easy-to-integrate API to help you allocate memory for DirectX®12 buffers and textures.

ADL

The AMD Display Library (ADL) SDK is designed to access display driver functionality for AMD Radeon™ and AMD FirePro™ graphics cards.

AGS

The AMD GPU Services (AGS) library provides software developers with the ability to query AMD GPU software and hardware state information that is not normally available through standard operating systems or graphics APIs.

AMD GPUOpen Vulkan Memory Allocator

VMA is our single-header, MIT-licensed, C++ library for easily and efficiently managing memory allocation for your Vulkan® games and applications.

AMD TrueAudio Next

AMD TrueAudio Next is a software development kit for GPU accelerated and multi-core high-performance audio signal processing.

AMD Radeon ProRender SDK

AMD Radeon™ ProRender SDK is a powerful physically-based path traced rendering engine that enables creative professionals to produce stunningly photorealistic images.

AMD Radeon Rays

The lightweight accelerated ray intersection library for DirectX®12 and Vulkan®.

AMD Compressonator

Compressonator is a set of tools to allow artists and developers to more easily work with compressed assets and easily visualize the quality impact of various compression technologies.

LiquidVR™ provides a Direct3D 11 based interface for applications to get access to the following GPU features regardless of whether a VR device is installed on a system.