The GPU Performance API (GPUPerfAPI, or GPA) is a powerful library, providing access to GPU Performance Counters. It can help analyze the performance and execution characteristics of applications using a Radeon™ GPU.

GPUPerfAPI is now integrated into RenderDoc.

Download the latest version - v3.7

This release adds the following features:

  • Add support for additional GPUs and APUs, including AMD RDNA™ 2 Radeon™ RX 6000 series GPUs.
  • New RT counters for DXR workloads on AMD RDNA™ 2 Radeon™ RX 6000 series GPUs:
    • RayTriTests, and RayBoxTests: These counters collect the number of ray intersections for triangles and boxes, respectively.
    • TotalRayTests: This counter collects the aggregated number of ray-box and ray-triangle intersection tests.
    • RayTestsPerWave: This counter collects ray intersection test count at a more granular level – per wave.
  • New Scalar and Instruction cache counters on AMD RDNA™ Radeon™ RX 5000 series GPUs:
    • Scalar cache: ScalarCacheHit, ScalarCacheRequestCount, ScalarCacheHitCount, ScalarCacheMissCount.
    • Instruction cache: InstCacheHit, InstCacheRequestCount, InstCacheHitCount, InstCacheMissCount.
  • Update the Vulkan® sample to remove the static link and use the system-specific Vulkan® loader.
  • Remove OpenCL™ support from Linux.
  • Remove downloading the Vulkan® SDK by the build script.



Supported GPUs

  • Radeon™ RX 6000 series
  • Radeon RX 5500 series and RX 5300 series
  • Radeon RX 5700 and RX 5700 XT
  • Radeon VII
  • Radeon RX Vega
  • Ryzen 5 2400G and Ryzen 3 2200G Processors with Radeon Vega Graphics
  • Radeon R9 Fury, Fury X and Fury Nano
  • Radeon RX 400 and RX 500
  • Tonga R9 285, R9 380

Supported graphics APIs

  • DirectX® 12
  • Vulkan®
  • DirectX® 11
  • OpenGL®

Supported compute APIs

  • OpenCL™

Supported OSs

  • Windows® 10
  • Linux – Ubuntu 20.04 LTS

Version history

  • Add support for additional GPUs and APUs, including AMD Ryzen™ 4000 Series APUs.
  • Add two new GFX10 GlobalMemory Counters for graphics using DX12 and Vulkan®: LocalVidMemBytes and PcieBytes .
  • Add VS2019 project support to CMake.
  • Restructure of GPA source layout to adhere to Google style.
  • Add support for additional GPUs and APUs, including Radeon™ 5500 and Radeon™ 5300 Series GPUs.
  • Add DirectX®11 sample application using GPUPerfAPI.
  • Add per-API static counter generation.
  • Decrease in GPUPerfAPI binaries size.
  • Add script to package GPUPerfAPI post-build.
  • Remove ROCm/HSA support.
  • Add Unicode support in GPUPerfAPI for Linux.
  • Bugs Fixed:
    • Fixed CMake files to respect supported build flags.
    • Fixed crash when DX12 debug layer was enabled.
    • Fixed an issue with loading of shader in GPA Vulkan® sample app.
    • Fixed an issue in Vulkan® build with newer Vulkan® SDK with amd_shader_core_properties2 extension
    • Fixed an issue with crash on unsupported Gfx6 and Gfx7 GPUs.
  • Add support for additional GPUs and APUs, including Radeon 5700 Series GPUs.
  • Add support for setting stable GPU clocks for DirectX11, OpenGL and OpenCL.
  • Add an OpenGL sample application that uses GPUPerfAPI.
  • Add basic counter validation to sample applications.
  • Add support for enabling individual hardware counters that make up derived counters.
  • Add two new GFX9 GlobalMemory Counters for graphics: LocalVidMemBytes and PcieBytes .
  • Reformat source code using clang-format.
  • Update counter documentation to contain per-hardware-generation tables.
  • Bugs Fixed:
    • Fixed error handling in GPA_GetEnabledIndex , GPA_EnableCounterByName , and GPA_DisbleCounterByName .
    • Fixed an issue with Vulkan timing counters (
    • Fixed an issue with SALUBusy counters.
    • Fixed an issue with HiZQuadsCulledCount and HiZQuadsSurvivingCount counters on GFX8 GPUs.
    • Fixed an issue with MemUnitBusy and MemUnitStalled counters on GFX8 GPUs.
    • Fixed an issue with VSVALUBusyCycles counter on GFX9 GPUs.
  • Add support for additional GPUs and APUs.
  • New CMake-based build system.
  • Support building on Ubuntu 18.04.
  • ROCm/HSA: uses new rather than deprecated library for performance counter collection.
  • Timing-based counters are now reported in nanoseconds instead of milliseconds.
  • New timing counter to report top-of-pipe to bottom-of-pipe duration.
  • GPA now builds GoogleTest libraries on the fly rather than using prebuilt binaries.
  • Add support for additional GPUs and APUs.
  • Wrapped all GPA entrypoints in try/catch to ensure unhandled exceptions do not escape the GPA library.
  • Add VS2017 project files.
  • Bugs Fixed:
  • Add support for additional GPUs and APUs.
  • Usability improvements to GPAInterfaceLoader.h .
  • New Vulkan and DirectX 12 sample applications.
  • New GPA_GetSampleId entry point.
  • New GPA_GetVersion entry point.
  • Bugs Fixed:
    • Fixed issues with some counters on 56CU Vega10.
    • Vulkan: Fixed GPA_ContinueSampleOnCommandList .
    • Vulkan: Ensure results are ready before trying to query them.
    • DirectX 12: Fixed incorrect device reference counting issue.
  • Add support for additional GPUs and APUs.
  • Support for collecting hardware counters for Vulkan and DirectX 12 applications.
  • Redesigned API to support modern graphics APIs.
  • The documentation has been rewritten and is now available in HTML format.
  • New counters added:
    • Cycle and count-based counters in addition to existing percentage-based counters.
    • New Depth Buffer memory read/write counters.
    • Additional Color Buffer memory counters.
    • For graphics, several global memory counters which were previously available only in the Compute Shader stage are now available generically.
  • Support for setting stable GPU clocks.
  • Counter Group Names can now be queried separately from Counter Descriptions.
  • Counters now have a UUID which can be used to uniquely identify a counter.
  • New entry point ( GPA_GetFuncTable ) to retrieve a table of function pointers for all GPA entry points.
  • New C++ GPAInterfaceLoader.h header file provides an easy way to load and use GPA entry points.
  • Bugs Fixed:
    • Fixed an issue with TesselatorBusy counter on many GFX8 GPUs.
    • Fixed an issue with FlatVMemInsts and CSFlatVMemInsts counters on many GFX8 GPUs.
    • Fixed an issue with LDSInsts counter on Vega GPUs.
    • Fixed some issues with Compute Shader counters on Vega GPUs.
    • Some counter combinations could lead to incorrect counter results.
    • Enabling counters in a certain order can lead to incorrect counter scheduling across multiple passes.
    • ROCm/HSA: GPA_OpenContext crashes if can’t be found.
    • ROCm/HSA: GPA does not coexist nicely with an application that also sets the HSA_TOOLS_LIB environment variable.
    • OpenGL: Fixed a crash that can occur with an incorrectly-configured OpenGL driver.
    • OpenGL: Fixed some issues with OpenGL device-detection.

Our other SDKs

AMD Radeon™ ProRender is our fast, easy, and incredible physically-based rendering engine built on industry standards that enables accelerated rendering on virtually any GPU, any CPU, and any OS in over a dozen leading digital content creation and CAD applications.

Radeon™ Machine Learning (Radeon™ ML or RML) is an AMD SDK for high-performance deep learning inference on GPUs.

Harness the power of machine learning to enhance images with denoising, enabling your application to produce high quality images in a fraction of the time traditional denoising filters take.

Advanced Media Framework

The Advanced Media Framework SDK provides developers with optimal access to AMD GPUs for multimedia processing.

The D3D12 Memory Allocator (D3D12MA) is a C++ library that provides a simple and easy-to-integrate API to help you allocate memory for DirectX®12 buffers and textures.

The AMD Display Library (ADL) SDK is designed to access display driver functionality for AMD Radeon™ and AMD FirePro™ graphics cards.

The AMD GPU Services (AGS) library provides software developers with the ability to query AMD GPU software and hardware state information that is not normally available through standard operating systems or graphics APIs.

VMA is our single-header, MIT-licensed, C++ library for easily and efficiently managing memory allocation for your Vulkan® games and applications.

AMD TrueAudio Next is a software development kit for GPU accelerated and multi-core high-performance audio signal processing.

AMD Radeon™ ProRender is a powerful physically-based path traced rendering engine that enables creative professionals to produce stunningly photorealistic images.

The lightweight accelerated ray intersection library for DirectX®12 and Vulkan®.

Compressonator is a set of tools to allow artists and developers to more easily work with compressed assets and easily visualize the quality impact of various compression technologies.