Usage

This page provides an overview of the GPUPerfAPI library. Please refer to the API Reference Page for more information.

Transitioning from GPA 3.x to 4.0

Summary

The main change in GPA 4.0 is that the counters are no longer exposed via the GpaContext and are now exposed by the GpaSession. There are now two different sets of counters, depending on the GpaSessionSampleType that is passed in when creating the session. Not all hardware and not all rendering APIs support all the GpaSessionSampleTypes, so after creating the GpaContext you should call GpaGetSupportedSampleTypes(..) to query which GpaSessionSampleTypes are supported on the given configuration. Users of the GPU Performance API should still only be looking for the kGpaSessionSampleTypeDiscreteCounter sample type. No support will be provided for the other sample types. Once the GpaSession is created, you can then use GpaGetNumCounters(..) and all of the GpaGetCounter*(..) entrypoints to query information about the available counters.

Below you will find an example of a likely subset of GPA 3.x series of API calls, and then a similar series of calls using the modified GPA 4.0 entrypoints. Note that no error checking is being done here for sake of clarity.

Example of GPA 3.x Usage

Copied!

GpaContextId context = nullptr;
GpaOpenContext(api_context, kGpaOpenContextDefaultBit, &context);
GpaUInt32 num_counters = 0;
GpaGetNumCounters(context, &num_counters);
for (GpaUInt32 i = 0; i < num; ++i)
{
    const char* name = nullptr;
    GpaUInt32 index = 0;
    const char* description = nullptr;
    const char* group = nullptr;
    GpaDataType data_type = kGpaDataTypeLast;
    GpaUsageType usage_type = kGpaUsageTypeList;
    GpaUuid uuid = {};
    GpaCounterSampleType sample_type = kGpaCounterSampleTypeDiscrete;

    GpaGetCounterName(context, i, &name);
    GpaGetCounterIndex(context, name, &index);
    assert(i == index);
    GpaGetCounterDescription(context, i, &description);
    GpaGetCounterDataType(context, i, &data_type);
    GpaGetCounterUsageType(context, i, &usage_type);
    GpaGetCounterUuid(context, i, &uuid);
    GpaGetCounterSampleType(context, i, &sample_type);
    assert(sample_type == kGpaCounterSampleTypeDiscrete);
}
GpaSessionId session = nullptr;
GpaCreateSession(context, kGpaContextSampleTypeDiscreteCounter, &session);
GpaUInt32 desired_counter_index = 0;
GpaEnableCounter(session, desired_counter_index);
GpaBeginSession(session);
...

Example of Similar GPA 4.0 Usage

Comments are added in the code below to explain the new code changes.

Copied!

GpaContextId context = nullptr;
GpaOpenContext(api_context, kGpaOpenContextDefaultBit, &context);

// These next two lines and the following conditional ensure that discrete counters are supported
// by the GpaContext for the current hardware and driver. In general the GpaOpenContext call above
// will report an error if the hardware and driver are not supported at all, but in some rare cases
// the discrete counters may not be supported while other sample types are supported.
GpaContextSampleTypeFlags supported_sample_types = kGpaSessionSampleTypeLast;
GpaGetSupportedSampleTypes(context, &supported_sample_types);

if (supported_sample_types & kGpaSessionSampleTypeDiscreteCounter != 0)
{
    // The set of available counters are now dependent on the SampleType being collected within
    // the session, so the session must now be created prior to querying for the available counters.
    GpaSessionId session = nullptr;
    GpaCreateSession(context, kGpaContextSampleTypeDiscreteCounter, &session);

    // The GpaSessionId is now supplied into GpaGetNumCounters() to get the number
    // of discrete counters that are available.
    GpaUInt32 num_counters = 0;
    GpaGetNumCounters(session, &num_counters);
    for (GpaUInt32 i = 0; i < num; ++i)
    {
        const char* name = nullptr;
        GpaUInt32 index = 0;
        const char* description = nullptr;
        const char* group = nullptr;
        GpaDataType data_type = kGpaDataTypeLast;
        GpaUsageType usage_type = kGpaUsageTypeList;
        GpaUuid uuid = {};
        GpaCounterSampleType sample_type = kGpaCounterSampleTypeDiscrete;

        // These next set of counter querying entrypoints now take in a GpaSessionId
        // rather than the GpaContextId.
        GpaGetCounterName(session, i, &name);
        GpaGetCounterIndex(session, name, &index);
        assert(i == index);
        GpaGetCounterDescription(session, i, &description);
        GpaGetCounterDataType(session, i, &data_type);
        GpaGetCounterUsageType(session, i, &usage_type);
        GpaGetCounterUuid(session, i, &uuid);
        GpaGetCounterSampleType(session, i, &sample_type);
        assert(sample_type == kGpaCounterSampleTypeDiscrete);
    }
    GpaUInt32 desired_counter_index = 0;
    GpaEnableCounter(session, desired_counter_index);
    GpaBeginSession(session);
    ...
}

Loading the GPUPerfAPI Library

GPUPerfAPI binary releases include separate library files for each supported API. The following table shows the name of the library files for each API

API

Library Names

Vulkan

64-bit Windows: GPUPerfAPIVK-x64.dll
64-bit Linux: libGPUPerfAPIVK.so

DirectX 12

64-bit Windows: GPUPerfAPIDX12-x64.dll

DirectX 11

64-bit Windows: GPUPerfAPIDX11-x64.dll

OpenGL

64-bit Windows: GPUPerfAPIGL-x64.dll
64-bit Linux: libGPUPerfAPIGL.so

To use the GPUPerfAPI library:

  • Include the header file gpu_performance_api/gpu_perf_api.h. For Vulkan, include gpu_performance_api/gpu_perf_api_vk.h.

  • Declare a variable of type GpaGetFuncTablePtrType

  • Load the GPUPerfAPI library

    • On Windows, use LoadLibrary on the GPUPerfAPI DLL for your chosen API (see above table)

    • On Linux, use dlopen on the GPUPerfAPI shared library for your chosen API (see above table)

  • Get the address of the GpaGetFuncTable function

    • On Windows, use GetProcAddres

    • On Linux, use dlsym

  • Call GpaGetFuncTable to get a table of function pointers for each API.

All of the above can be simplified using the gpu_perf_api_interface_loader.h C++ header file. This header file simplifies the loading and initialization of the GPA entrypoints. The following code shows how to use this header file to load and initialize the DirectX 12 version of GPA:

Copied!

#include "gpu_performance_api/gpu_perf_api_interface_loader.h"

#ifdef __cplusplus
GpaApiManager* GpaApiManager::gpa_api_manager_ = nullptr;
#endif
GpaFuncTableInfo* gpa_function_table_info = nullptr;
GpaFunctionTable* gpa_function_table      = nullptr;

bool InitializeGpa()
{
    bool ret_val = false;

    if (kGpaStatusOk == GpaApiManager::Instance()->LoadApi(kGpaApiDirectx12))
    {
        gpa_function_table = GpaApiManager::Instance()->GetFunctionTable(kGpaApiDirectx12);

        if (nullptr != gpa_function_table)
        {
            ret_val = kGpaStatusOk == gpa_function_table->GpaInitialize(kGpaInitializeDefaultBit);
        }
    }

    return ret_val;
}

Registering a Logging Callback

An entrypoint is available for registering a callback function which GPUPerfAPI will use to report back additional information about errors and general API usage. It is recommended that all GPUPerfAPI clients register a logging callback for error messages at a minimum. Any time a GPUPerfAPI function returns an error, it will output a log message with more information about the condition that caused the error.

In order to use this feature, you must define a static function with the following signature:

Copied!

void MyLoggingFunction(GpaLoggingType message_type, const char* message)

The function is registered using the GpaRegisterLoggingCallback entrypoint.

The function registered will receive callbacks for message types registered. The message type is passed into the logging function so that different message types can be handled differently if desired. For instance, errors could be output to stderr or be used to raise an assert, while messages and trace information could be output to an application’s or tool’s normal log file. A tool may also want to prefix log messages with a string representation of the log type before writing the message. The messages passed into the logging function will not have a newline at the end, allowing for more flexible handling of the message.

Initializing and Destroying a GPUPerfAPI Instance

GPUPerfAPI must be initialized before the rendering context or device is created, so that the driver can be prepared for accessing hardware data. In the case of DirectX 12 or Vulkan, initialization must be done before a queue is created. Once you are done using GPUPerfAPI, you should destroy the GPUPerfAPI instance. In the case of DirectX 12, destruction must be done before the device is destroyed.

The following methods can be used to initialize and destroy GPUPerfAPI:

GPA Initialization/Destruction Method

Brief Description

GpaInitialize

Initializes the driver so that counters are exposed.

GpaDestroy

Undoes any initialization to ensure proper behavior in applications that are not being profiled.

An example of the code used to initialize a GPUPerfAPI instance can be seen above in the GpaInterfaceLoader sample code

Opening and Closing a Context

After initializing a GPUPerfAPI instance and after the necessary API-specific construct has been created, a context can be opened using the GpaOpenContext function. Once a context is open you can query information about the hardware device, including the supported sample types, and create and begin a session. After you are done using GPUPerfAPI, you should close the context.

The following methods can be used to open and close contexts:

Context Handling Method

Brief Description

GpaOpenContext

Opens the counters in the specified context for reading.

GpaCloseContext

Closes the counters in the specified context.

When calling GpaOpenContext, the type of the supplied context is different depending on which API is being used. See the table below for the required type which should be passed to GpaOpenContext:

API

GpaOpenContext context Parameter Type

Vulkan

GpaVkContextOpenInfo*
(defined in gpu_perf_api_vk.h)

DirectX 12

ID3D12Device*

DirectX 11

ID3D11Device*

OpenGL

Windows: HGLRC
Linux: GLXContext

Querying a Context and Sample Types

After creating a context, you can use the returned GpaContextId to query information about the hardware and the types of performance counter samples available.

The following methods can be used to query information about the context:

Context Query Method

Brief Description

GpaGetSupportedSampleTypes

Gets a mask of the sample types supported by the specified context.

GpaGetDeviceAndRevisionId

Gets the GPU device and revision id associated with the specified context.

GpaGetDeviceName

Gets the device name of the GPU associated with the specified context.

GpaGetDeviceGeneration

Gets the device generation of the GPU associated with the specified context.

Creating and Using a Session

After creating a context, a session can be created. A session is specific to a type of performance counter samples, exposes the available counters, and is also the container for enabling counters, sampling GPU workloads, and storing/retrieving the results.

The following methods can be used to manage sessions:

Session Handling Method

Brief Description

GpaCreateSession

Creates a session.

GpaDeleteSession

Deletes a session object.

GpaBeginSession

Begins sampling with the currently enabled set of counters.

GpaEndSession

Ends sampling with the currently enabled set of counters.

The following methods can be used to query information about performance counters:

Counter Query Method

Brief Description

GpaGetNumCounters

Gets the number of counters available.

GpaGetCounterName

Gets the name of the specified counter.

GpaGetCounterIndex

Gets index of a counter given its name (case insensitive).

GpaGetCounterGroup

Gets the group of the specified counter.

GpaGetCounterDescription

Gets the description of the specified counter.

GpaGetCounterDataType

Gets the data type of the specified counter.

GpaGetCounterUsageType

Gets the usage type of the specified counter.

GpaGetCounterUuid

Gets the UUID of the specified counter.

GpaGetCounterSampleType

Gets the supported sample type of the specified counter.

GpaGetDataTypeAsStr

Gets a string with the name of the specified counter data type.

GpaGetUsageTypeAsStr

Gets a string with the name of the specified counter usage type.

Enabling Counters on a Session

After creating a session but before sampling on that session, counters should be enabled. This must be done after GpaCreateSession is called, but before GpaBeginSession is called.

The following methods can be used to enable/disable counters on a session:

Counter Enable/Disable Method

Brief Description

GpaEnableCounter

Enables a specified counter.

GpaDisableCounter

Disables a specified counter.

GpaEnableCounterByName

Enables a specified counter using the counter name (case insensitive).

GpaDisableCounterByName

Disables a specified counter using the counter name (case insensitive).

GpaEnableAllCounters

Enables all counters.

GpaDisableAllCounters

Disables all counters.

Querying Enabled Counters and Counter Scheduling

A session can be also queried for information about which counters are enabled as well as information on the number of passes required for the current set of enabled counters.

The following methods can be used to query enabled counters and counter scheduling on a session:

Counter Scheduling Query Method

Brief Description

GpaGetPassCount

Gets the number of passes required for the currently enabled set of counters.

GpaGetNumEnabledCounters

Gets the number of enabled counters.

GpaGetEnabledIndex

Gets the counter index for an enabled counter.

GpaIsCounterEnabled

Checks whether or not a counter is enabled.

Creating and Managing Samples

After counters are enabled on a session and the session has been started, GPA command lists and samples can be created. A sample is the GPU workload for which performance counters are to be collected. All enabled counters will be collected for each sample. For DirectX 12 and Vulkan, samples can start on one command list and end on another. There is also special handling needed for DirectX 12 bundles and Vulkan secondary command buffers.

The following methods can be used to create and manage samples on a session:

Sample Handling Method

Brief Description

GpaBeginCommandList

Begins command list for sampling.

GpaEndCommandList

Ends command list for sampling.

GpaBeginSample

Begins a sample in a command list.

GpaEndSample

Ends a sample in a command list.

GpaContinueSampleOnCommandList

Continues a primary command list sample on another primary command list.

GpaCopySecondarySamples

Copies a set of samples from a secondary command list back to the primary command list that executed the secondary command list.

GpaGetSampleCount

Returns the number of samples created for the specified session.

Querying Results

Once sampling is complete and the session has been ended, the sample results can be read. For DirectX 12 and Vulkan, the command list or command buffer which contains the samples must have been fully executed before results will be available.

The following methods can be used to check if results are available and to read the results for samples:

Results Querying Method

Brief Description

GpaIsPassComplete

Checks whether or not a pass has finished.

GpaIsSessionComplete

Checks if results for all samples within a session are available.

GpaGetSampleResultSize

Gets the result size for a given sample.

GpaGetSampleResult

Gets the result data for a given sample.

Displaying Status/Error

All GPUPerfAPI functions return a GpaStatus code to indicate success or failure. A simple string representation of the status or error codes can be retrieved using the following method:

Status/Error Helper Method

Brief Description

GpaGetStatusAsStr

Gets a string representation of a GpaStatus value.

Multi-pass Counter Collection

Collection of some individual counters and some combinations of counters will require more than one pass. After enabling counters, you can query the number of passes required. If the number of passes is greater than one, you will need to execute an identical GPU workload once for each pass. For DirectX 12 and Vulkan, this typically means recording the same command list or command buffer more than once, calling GpaBeginCommandList on each command list for each pass, and beginning and ending samples for the same workloads within the command lists. For other graphics and compute APIs, this means making the same draw calls or dispatching the same kernels in the same sequence multiple times. The same sample id must be found in every pass, and that sample id must be used for the same workload within each pass. If it is impossible or impractical to repeat the operations to be profiled, select a counter set requiring only a single pass. For sets requiring more than one pass, results are available only after all passes are complete.

Specific Usage Note for Vulkan

In order to enable counter collection in the Vulkan driver, several Vulkan extensions are required. The application being profiled with GPUPerfAPI will need to request those extensions as part of the Vulkan instance and device initialization. GPUPerfAPI simplifies this by defining three macros in the gpu_performance_api/gpu_perf_api_vk.h header file: AMD_GPA_REQUIRED_INSTANCE_EXTENSION_NAME_LIST for the required instance extensions, AMD_GPA_REQUIRED_DEVICE_EXTENSION_NAME_LIST for the required device extensions and AMD_GPA_OPTIONAL_DEVICE_EXTENSION_NAME_LIST for optional, but recommended, device extensions. The extensions defined in AMD_GPA_REQUIRED_INSTANCE_EXTENSION_NAME_LIST should be included in the VkInstanceCreateInfo structure that is passed to the vkCreateInstance function. Similarly, the extensions defined in AMD_GPA_REQUIRED_DEVICE_EXTENSION_NAME_LIST and AMD_GPA_OPTIONAL_DEVICE_EXTENSION_NAME_LIST should be included in the VkDeviceCreateInfo structure that is passed to VkCreateDevice function.

Specific Usage Note for Bundles (DirectX 12) and Secondary Command Buffers (Vulkan)

While samples within a Bundle or Secondary Command Buffer (both referred to here as “secondary command lists”) are supported by GPUPerfAPI, they require special handling. Both the primary and secondary command list must be started using GpaBeginCommandList. Samples can be created on both types of command lists; however, the samples on the secondary command list must be copied back to the primary command list. This is done using the GpaCopySecondarySamples function. Once samples are copied back to the primary command list, results will be available after the primary command list has been executed. Bundles or secondary command buffers must be re-recorded for each counter pass. This also means that extra GpaCommandListId instances must be created (one per pass for each bundle or secondary command buffer) in order to support copying the results from the bundles or secondary command buffers after execution.

Specific Usage Note for Samples that Start and End on Different Command Lists

For DirectX 12 and Vulkan, GPUPerfAPI supports starting a sample on one command list and ending it on another. For this to work properly, the command lists must be executed in the correct order by the application – the command list which ends the sample must be executed after the command list which begins the sample. Both the command list where the sample starts and the command list where the sample ends must be started using GpaBeginCommandList. After the sample has been started on the first command list using GpaBeginSample, it can be continued on another command list by calling GpaContinueSampleOnCommandList. After it has been continued, the sample can be ended using GpaEndSample and specifying the second command list.

Deploying GPUPerfAPI

To deploy an application that uses GPUPerfAPI, simply make sure that the necessary GPUPerfAPI library is available and can be loaded using the normal library search mechanism for the host operating system (i.e. in the PATH on Windows and LD_LIBRARY_PATH on Linux).

When deploying the DirectX 11 version on Windows, you will also need to deploy GPUPerfAPIDXGetAMDDeviceInfo-x64.dll, if you need to support systems with multiple AMD GPUs. This library is used by GPA to determine which GPU is being used for rendering at runtime. For single-GPU systems, this library is not required.

Related pages

Looking for more documentation on GPUOpen?

AMD GPUOpen software blogs

Our handy software release blogs will help you make good use of our tools, SDKs, and effects, as well as sharing the latest features with new releases.

GPUOpen Manuals

Don’t miss our manual documentation! And if slide decks are what you’re after, you’ll find 100+ of our finest presentations here.

AMD GPUOpen Performance Guides

The home of great performance and optimization advice for AMD RDNA™ 2 GPUs, AMD Ryzen™ CPUs, and so much more.

Getting started: AMD GPUOpen software

New or fairly new to AMD’s tools, libraries, and effects? This is the best place to get started on GPUOpen!

AMD GPUOpen Getting Started Development and Performance

Looking for tips on getting started with developing and/or optimizing your game, whether on AMD hardware or generally? We’ve got you covered!

AMD GPUOpen Technical blogs

Browse our technical blogs, and find valuable advice on developing with AMD hardware, ray tracing, Vulkan®, DirectX®, Unreal Engine, and lots more.

Find out more about our software!

AMD GPUOpen Effects - AMD FidelityFX technologies

Create wonder. No black boxes. Meet the AMD FidelityFX SDK!

AMD GPUOpen Samples

Browse all our useful samples. Perfect for when you’re needing to get started, want to integrate one of our libraries, and much more.

AMD GPUOpen developer SDKs

Discover what our SDK technologies can offer you. Query hardware or software, manage memory, create rendering applications or machine learning, and much more!

AMD GPUOpen Developer Tools

Analyze, Optimize, Profile, Benchmark. We provide you with the developer tools you need to make sure your game is the best it can be!