Skip to content

Performance Counters

Copyright (c) 2018-2025 Advanced Micro Devices, Inc. All rights reserved.GPU Performance CountersThe performance counters exposed through GPU Performance API are organized into groups to help provide clarity and organization to all the available data. Below is a collective list of counters from all the supported hardware generations. Some of the counters may not be available depending on the hardware being profiled. To view which GPUs belong to which hardware generations, the best reference is the gs_cardInfo array in the device_info repository on GitHub. You can see how the various cards map to hardware generations by looking at the GDT_HW_GENERATION enum

For Graphics workloads, it is recommended that you initially profile with counters from the Timing group to determine whether the profiled calls are worth optimizing (based on GPUTime value), and which parts of the pipeline are performing the most work. Note that because the GPU is highly parallelized, various parts of the pipeline can be active at the same time; thus, the “Busy” counters probably will sum over 100 percent. After identifying one or more stages to investigate further, enable the corresponding counter groups for more information on the stage and whether or not potential optimizations exist.

Pipeline-Based Counter Groups

On RDNA, RDNA2, and RDNA3 hardware, certain use cases allow the driver to make optimizations by combining two shader stages together. For example, in a Vertex

  • Geometry + Pixel Shader pipeline (VS-GS-PS), the Vertex and Geometry Shaders get combined and GPUPerfAPI exposes them in the “VertexGeometry” group (counters with the “VsGs” prefix). In pipelines that use tessellation, the Vertex and Hull Shaders are combined and exposed as the “PreTessellation” group (with “PreTess” prefix), and the Domain and Geometry Shaders (if GS is used) are combined into the the “PostTessellation” group (with “PostTess” prefix). Pixel Shaders and Compute Shaders are always exposed as their respective types. The table below may help to visualize the mapping between the API-level shaders (across the top), and which prefixes to look for in the GPUPerfAPI counters.
PipelineVertexHullDomainGeometryPixelCompute
VS-PSVsGsPS
VS-GS-PSVsGsVsGsPS
VS-HS-DS-PSPreTessPreTessPostTessPostTessPS
VS-HS-DS-GS-PSPreTessPreTessPostTessPostTessPS
CSCS

A Note About Third-Party Applications

Several third-party applications (such as RenderDoc and Microsoft PIX) integrate GPUPerfAPI as part of their profiling feature set. These applications may choose to expose only a subset of the counters supported by GPUPerfAPI, especially in cases where the counters do not support the design goals of the application. Specifically, it is known that the counters reporting a percentage are not exposed in RenderDoc. This is due to the way that these tools collect and report aggregate performance counter values for groups of draw calls. For instance, if a set of draw calls is grouped together by a User Marker, a tool may report performance counter values for the User Marker by simply summing up the counter values for the individual draw calls. While this may be valid for many counters, it does not work well for percentage-based counters. Even if the tools were to perform a simple average of the percent values, it still may not provide an accurate reflection of the actual performance. For most of the percentage-based counters, GPUPerfAPI also exposes counters representing the components used to calculate the percentage. One example of this is the cache hit counters — these are exposed both as a cache hit percentage and as individual counters representing the number of cache requests, the number of hits and the number of misses. Please reference the Usage column of the tables below to know which counters will not be exposed by these applications.

Counters Exposed for Graphics Performance Analysis

The following tables show the set of counters exposed for analysis of GPU Graphics workloads, as well the family of GPUs and APUs on which each counter is available:

Copyright(c) 2018-2025 Advanced Micro Devices, Inc. All rights reserved.Graphics Performance Counters for RDNA4*** Note, this is an auto-generated file. Do not edit. Execute PublicCounterCompiler to rebuild.

RDNA4 Counters

Timing Group

Counter NameSample TypeUsageBrief Description
GPUTimeDiscreteNanosecondsTime this API command took to execute on the GPU in nanoseconds from the time the previous command reached the bottom of the pipeline (BOP) to the time this command reaches the bottom of the pipeline (BOP). Does not include time that draw calls are processed in parallel.
ExecutionDurationDiscreteNanosecondsGPU command execution duration in nanoseconds, from the time the command enters the top of the pipeline (TOP) to the time the command reaches the bottom of the pipeline (BOP). Does not include time that draw calls are processed in parallel.
ExecutionStartDiscreteNanosecondsGPU command execution start time in nanoseconds. This is the time the command enters the top of the pipeline (TOP).
ExecutionEndDiscreteNanosecondsGPU command execution end time in nanoseconds. This is the time the command reaches the bottom of the pipeline (BOP).
GPUBusyDiscrete, StreamingPercentageThe percentage of time the GPU command processor was busy.
GPUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the GPU command processor was busy.
TessellatorBusyDiscrete, StreamingPercentageThe percentage of time the tessellation engine is busy.
TessellatorBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the tessellation engine is busy.
VsGsBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has VS or GS work to do in a VS-[GS-]PS pipeline.
VsGsBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has VS or GS work to do in a VS-[GS-]PS pipeline.
VsGsTimeDiscreteNanosecondsTime VS or GS are busy in nanoseconds in a VS-[GS-]PS pipeline.
PreTessellationBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has VS and HS work to do in a pipeline that uses tessellation.
PreTessellationBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has VS and HS work to do in a pipeline that uses tessellation.
PreTessellationTimeDiscreteNanosecondsTime VS and HS are busy in nanoseconds in a pipeline that uses tessellation.
PostTessellationBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has DS or GS work to do in a pipeline that uses tessellation.
PostTessellationBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has DS or GS work to do in a pipeline that uses tessellation.
PostTessellationTimeDiscreteNanosecondsTime DS or GS are busy in nanoseconds in a pipeline that uses tessellation.
PSBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has pixel shader work to do.
PSBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has pixel shader work to do.
PSTimeDiscreteNanosecondsTime pixel shaders are busy in nanoseconds.
CSBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has compute shader work to do.
CSBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has compute shader work to do.
CSTimeDiscreteNanosecondsTime compute shaders are busy in nanoseconds.
PrimitiveAssemblyBusyDiscretePercentageThe percentage of GPUTime that primitive assembly (clipping and culling) is busy. High values may be caused by having many small primitives; mid to low values may indicate pixel shader or output buffer bottleneck.
PrimitiveAssemblyBusyCyclesDiscreteCyclesNumber of GPU cycles the primitive assembly (clipping and culling) is busy. High values may be caused by having many small primitives; mid to low values may indicate pixel shader or output buffer bottleneck.
TexUnitBusyDiscrete, StreamingPercentageThe percentage of GPUTime the texture unit is active. This is measured with all extra fetches and any cache or memory effects taken into account.
TexUnitBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles the texture unit is active. This is measured with all extra fetches and any cache or memory effects taken into account.
DepthStencilTestBusyDiscrete, StreamingPercentagePercentage of time GPU spent performing depth and stencil tests relative to GPUBusy.
DepthStencilTestBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles spent performing depth and stencil tests.

VertexGeometry Group

Counter NameSample TypeUsageBrief Description
VsGsVerticesInDiscrete, StreamingItemsThe number of unique vertices processed by the VS and GS.
VsGsPrimsInDiscrete, StreamingItemsThe number of primitives passed into the GS.

PreTessellation Group

Counter NameSample TypeUsageBrief Description
PreTessVerticesInDiscrete, StreamingItemsThe number of unique vertices processed by the VS and HS when using tessellation.

PostTessellation Group

Counter NameSample TypeUsageBrief Description
PostTessPrimsOutDiscrete, StreamingItemsThe number of primitives output by the DS and GS when using tessellation.

PrimitiveAssembly Group

Counter NameSample TypeUsageBrief Description
PrimitivesInDiscrete, StreamingItemsThe number of primitives received by the hardware. This includes primitives generated by tessellation.
CulledPrimsDiscreteItemsThe number of culled primitives. Typical reasons include scissor, the primitive having zero area, and back or front face culling.
ClippedPrimsDiscreteItemsThe number of primitives that required one or more clipping operations due to intersecting the view volume or user clip planes.
PAStalledOnRasterizerDiscrete, StreamingPercentagePercentage of GPUTime that primitive assembly waits for rasterization to be ready to accept data. This roughly indicates for what percentage of time the pipeline is bottlenecked by pixel operations.
PAStalledOnRasterizerCyclesDiscrete, StreamingCyclesNumber of GPU cycles the primitive assembly waits for rasterization to be ready to accept data. Indicates the number of GPU cycles the pipeline is bottlenecked by pixel operations.

PixelShader Group

Counter NameSample TypeUsageBrief Description
PSPixelsOutDiscrete, StreamingItemsPixels exported from shader to color buffers. Does not include killed or alpha tested pixels; if there are multiple render targets, each render target receives one export, so this will be 2 for 1 pixel written to two RTs.
PSExportStallsDiscrete, StreamingPercentagePixel shader output stalls. Percentage of GPUBusy. Should be zero for PS or further upstream limited cases; if not zero, indicates a bottleneck in late Z testing or in the color buffer.
PSExportStallsCyclesDiscrete, StreamingCyclesNumber of GPU cycles the pixel shader output stalls. Should be zero for PS or further upstream limited cases; if not zero, indicates a bottleneck in late Z testing or in the color buffer.

ComputeShader Group

Counter NameSample TypeUsageBrief Description
CSThreadGroupsLaunchedDiscrete, StreamingItemsTotal number of thread groups launched.
CSWavefrontsLaunchedDiscrete, StreamingItemsThe total number of wavefronts launched for the CS.
CSThreadsLaunchedDiscrete, StreamingItemsThe number of CS threads launched and processed by the hardware.
CSThreadGroupSizeDiscrete, StreamingItemsThe number of CS threads within each thread group.
CSLDSBankConflictDiscrete, StreamingPercentageThe percentage of GPUTime LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad).
CSLDSBankConflictCyclesDiscrete, StreamingCyclesNumber of GPU cycles the LDS is stalled by bank conflicts. Value range: 0 (optimal) to GPUBusyCycles (bad).

TextureUnit Group

Counter NameSample TypeUsageBrief Description
TexTriFilteringPctDiscrete, StreamingPercentagePercentage of pixels that received trilinear filtering. Note that not all pixels for which trilinear filtering is enabled will receive it (e.g. if the texture is magnified).
TexTriFilteringCountDiscrete, StreamingItemsCount of pixels that received trilinear filtering. Note that not all pixels for which trilinear filtering is enabled will receive it (e.g. if the texture is magnified).
NoTexTriFilteringCountDiscrete, StreamingItemsCount of pixels that did not receive trilinear filtering.
TexVolFilteringPctDiscrete, StreamingPercentagePercentage of pixels that received volume filtering.
TexVolFilteringCountDiscrete, StreamingItemsCount of pixels that received volume filtering.
NoTexVolFilteringCountDiscrete, StreamingItemsCount of pixels that did not receive volume filtering.
TexAveAnisotropyDiscreteItemsThe average degree of anisotropy applied. A number between 1 and 16. The anisotropic filtering algorithm only applies samples where they are required (e.g. there will be no extra anisotropic samples if the view vector is perpendicular to the surface) so this can be much lower than the requested anisotropy.

DepthAndStencil Group

Counter NameSample TypeUsageBrief Description
HiZQuadsCulledDiscretePercentagePercentage of quads that did not have to continue on in the pipeline after HiZ. They may be written directly to the depth buffer, or culled completely. Consistently low values here may suggest that the Z-range is not being fully utilized.
HiZQuadsCulledCountDiscreteItemsCount of quads that did not have to continue on in the pipeline after HiZ. They may be written directly to the depth buffer, or culled completely. Consistently low values here may suggest that the Z-range is not being fully utilized.
HiZQuadsAcceptedCountDiscrete, StreamingItemsCount of quads that did continue on in the pipeline after HiZ.
PreZQuadsCulledDiscretePercentagePercentage of quads rejected based on the detailZ and earlyZ tests.
PreZQuadsCulledCountDiscreteItemsCount of quads rejected based on the detailZ and earlyZ tests.
PreZQuadsSurvivingCountDiscrete, StreamingItemsCount of quads surviving detailZ and earlyZ tests.
PostZQuadsDiscretePercentagePercentage of quads for which the pixel shader will run and may be postZ tested.
PostZQuadCountDiscrete, StreamingItemsCount of quads for which the pixel shader will run and may be postZ tested.
PreZSamplesPassingDiscrete, StreamingItemsNumber of samples tested for Z before shading and passed.
PreZSamplesFailingSDiscrete, StreamingItemsNumber of samples tested for Z before shading and failed stencil test.
PreZSamplesFailingZDiscrete, StreamingItemsNumber of samples tested for Z before shading and failed Z test.
PostZSamplesPassingDiscrete, StreamingItemsNumber of samples tested for Z after shading and passed.
PostZSamplesFailingSDiscrete, StreamingItemsNumber of samples tested for Z after shading and failed stencil test.
PostZSamplesFailingZDiscrete, StreamingItemsNumber of samples tested for Z after shading and failed Z test.
ZUnitStalledDiscrete, StreamingPercentageThe percentage of GPUTime the depth buffer spends waiting for the color buffer to be ready to accept data. High figures here indicate a bottleneck in color buffer operations.
ZUnitStalledCyclesDiscrete, StreamingCyclesNumber of GPU cycles the depth buffer spends waiting for the color buffer to be ready to accept data. Larger numbers indicate a bottleneck in color buffer operations.

ColorBuffer Group

Counter NameSample TypeUsageBrief Description
CBMemReadDiscrete, StreamingBytesNumber of bytes read from the color buffer.
CBMemWrittenDiscrete, StreamingBytesNumber of bytes written to the color buffer.

MemoryCache Group

Counter NameSample TypeUsageBrief Description
L0CacheHitDiscrete, StreamingPercentageThe percentage of read requests that hit the data in the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal).
L0CacheRequestCountDiscrete, StreamingItemsThe number of read requests made to the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size.
L0CacheHitCountDiscrete, StreamingItemsThe number of read requests which result in a cache hit from the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size.
L0CacheMissCountDiscrete, StreamingItemsThe number of read requests which result in a cache miss from the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size.
ScalarCacheHitDiscrete, StreamingPercentageThe percentage of read requests made from executing shader code that hit the data in the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size. Value range: 0% (no hit) to 100% (optimal).
ScalarCacheRequestCountDiscrete, StreamingItemsThe number of read requests made from executing shader code to the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size.
ScalarCacheHitCountDiscrete, StreamingItemsThe number of read requests made from executing shader code which result in a cache hit from the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size.
ScalarCacheMissCountDiscrete, StreamingItemsThe number of read requests made from executing shader code which result in a cache miss from the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size.
InstCacheHitDiscrete, StreamingPercentageThe percentage of read requests made that hit the data in the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size. Value range: 0% (no hit) to 100% (optimal).
InstCacheRequestCountDiscrete, StreamingItemsThe number of read requests made to the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size.
InstCacheHitCountDiscrete, StreamingItemsThe number of read requests which result in a cache hit from the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size.
InstCacheMissCountDiscrete, StreamingItemsThe number of read requests which result in a cache miss from the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size.
L2CacheHitDiscrete, StreamingPercentageThe percentage of read or write requests that hit the data in the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal).
L2CacheMissDiscrete, StreamingPercentageThe percentage of read or write requests that miss the data in the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. Value range: 0% (optimal) to 100% (all miss).
L2CacheRequestCountDiscrete, StreamingItemsThe number of read or write requests made to the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size.
L2CacheHitCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache hit from the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size.
L2CacheMissCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache miss from the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size.
L0TagConflictReadStalledCyclesDiscrete, StreamingItemsThe number of cycles read operations from the L0 cache are stalled due to tag conflicts.
L0TagConflictWriteStalledCyclesDiscrete, StreamingItemsThe number of cycles write operations to the L0 cache are stalled due to tag conflicts.
L0TagConflictAtomicStalledCyclesDiscrete, StreamingItemsThe number of cycles atomic operations on the L0 cache are stalled due to tag conflicts.

GlobalMemory Group

Counter NameSample TypeUsageBrief Description
FetchSizeDiscrete, StreamingBytesThe total bytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account.
WriteSizeDiscrete, StreamingBytesThe total bytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account.
MemUnitBusyDiscrete, StreamingPercentageThe percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound).
MemUnitBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles the memory unit is active. The result includes the stall time (MemUnitStalledCycles). This is measured with all extra fetches and writes and any cache or memory effects taken into account.
MemUnitStalledDiscrete, StreamingPercentageThe percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad).
MemUnitStalledCyclesDiscrete, StreamingCyclesNumber of GPU cycles the memory unit is stalled.
WriteUnitStalledDiscrete, StreamingPercentageThe percentage of GPUTime the Write unit is stalled. Value range: 0% to 100% (bad).
WriteUnitStalledCyclesDiscrete, StreamingCyclesNumber of GPU cycles the Write unit is stalled.
LocalVidMemBytesDiscreteBytesNumber of bytes read from or written to the Infinity Cache (if available) or local video memory
PcieBytesDiscrete, StreamingBytesNumber of bytes sent and received over the PCIe bus

RayTracing Group

Counter NameSample TypeUsageBrief Description
RayTriTestsDiscrete, StreamingItemsThe number of ray triangle intersection tests.
RayBoxTestsDiscrete, StreamingItemsThe number of ray box intersection tests.
TotalRayTestsDiscrete, StreamingItemsTotal number of ray intersection tests, includes both box and triangle intersections.
RayTestsPerWaveDiscrete, StreamingItemsThe number of ray intersection tests per wave.

WaveDistribution Group

Counter NameSample TypeUsageBrief Description
WaveOccupancyPctStreamingPercentageThe percentage of the maximum wavefront occupancy that is currently being used.

WaveOccupancyLimiters Group

Counter NameSample TypeUsageBrief Description
HSLimitedByVgprStreamingPercentageThe percentage of HS wave scheduling requests that are limited by VGPR availability.
HSLimitedByLdsStreamingPercentageThe percentage of HS wave scheduling requests that are limited by LDS availability.
HSLimitedByScratchStreamingPercentageThe percentage of HS wave scheduling requests that are limited by scratch space availability.
HSLimitedByBarriersStreamingPercentageThe percentage of HS wave scheduling requests that are limited by barriers.
GSLimitedByVgprStreamingPercentageThe percentage of GS wave scheduling requests that are limited by VGPR availability.
GSLimitedByLdsStreamingPercentageThe percentage of GS wave scheduling requests that are limited by LDS availability.
GSLimitedByScratchStreamingPercentageThe percentage of GS wave scheduling requests that are limited by scratch space availability.
PSLimitedByLdsStreamingPercentageThe percentage of PS wave scheduling requests that are limited by LDS availability.
PSLimitedByVgprStreamingPercentageThe percentage of PS wave scheduling requests that are limited by VGPR availability.
PSLimitedByScratchStreamingPercentageThe percentage of PS wave scheduling requests that are limited by scratch space availability.
CSLimitedByLdsStreamingPercentageThe percentage of CS wave scheduling requests that are limited by LDS availability.
CSLimitedByVgprStreamingPercentageThe percentage of CS wave scheduling requests that are limited by VGPR availability.
CSLimitedByScratchStreamingPercentageThe percentage of CS wave scheduling requests that are limited by scratch space availability.
CSLimitedByBarriersStreamingPercentageThe percentage of CS wave scheduling requests that are limited by barriers.
CSLimitedByThreadGroupLimitStreamingPercentageThe percentage of CS wave scheduling requests that are limited by the thread group limit.

Copyright(c) 2018-2025 Advanced Micro Devices, Inc. All rights reserved.Graphics Performance Counters for RDNA3*** Note, this is an auto-generated file. Do not edit. Execute PublicCounterCompiler to rebuild.

RDNA3 Counters

Timing Group

Counter NameSample TypeUsageBrief Description
GPUTimeDiscreteNanosecondsTime this API command took to execute on the GPU in nanoseconds from the time the previous command reached the bottom of the pipeline (BOP) to the time this command reaches the bottom of the pipeline (BOP). Does not include time that draw calls are processed in parallel.
ExecutionDurationDiscreteNanosecondsGPU command execution duration in nanoseconds, from the time the command enters the top of the pipeline (TOP) to the time the command reaches the bottom of the pipeline (BOP). Does not include time that draw calls are processed in parallel.
ExecutionStartDiscreteNanosecondsGPU command execution start time in nanoseconds. This is the time the command enters the top of the pipeline (TOP).
ExecutionEndDiscreteNanosecondsGPU command execution end time in nanoseconds. This is the time the command reaches the bottom of the pipeline (BOP).
GPUBusyDiscrete, StreamingPercentageThe percentage of time the GPU command processor was busy.
GPUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the GPU command processor was busy.
TessellatorBusyDiscrete, StreamingPercentageThe percentage of time the tessellation engine is busy.
TessellatorBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the tessellation engine is busy.
VsGsBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has VS or GS work to do in a VS-[GS-]PS pipeline.
VsGsBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has VS or GS work to do in a VS-[GS-]PS pipeline.
VsGsTimeDiscreteNanosecondsTime VS or GS are busy in nanoseconds in a VS-[GS-]PS pipeline.
PreTessellationBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has VS and HS work to do in a pipeline that uses tessellation.
PreTessellationBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has VS and HS work to do in a pipeline that uses tessellation.
PreTessellationTimeDiscreteNanosecondsTime VS and HS are busy in nanoseconds in a pipeline that uses tessellation.
PostTessellationBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has DS or GS work to do in a pipeline that uses tessellation.
PostTessellationBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has DS or GS work to do in a pipeline that uses tessellation.
PostTessellationTimeDiscreteNanosecondsTime DS or GS are busy in nanoseconds in a pipeline that uses tessellation.
PSBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has pixel shader work to do.
PSBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has pixel shader work to do.
PSTimeDiscreteNanosecondsTime pixel shaders are busy in nanoseconds.
CSBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has compute shader work to do.
CSBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has compute shader work to do.
CSTimeDiscreteNanosecondsTime compute shaders are busy in nanoseconds.
PrimitiveAssemblyBusyDiscretePercentageThe percentage of GPUTime that primitive assembly (clipping and culling) is busy. High values may be caused by having many small primitives; mid to low values may indicate pixel shader or output buffer bottleneck.
PrimitiveAssemblyBusyCyclesDiscreteCyclesNumber of GPU cycles the primitive assembly (clipping and culling) is busy. High values may be caused by having many small primitives; mid to low values may indicate pixel shader or output buffer bottleneck.
TexUnitBusyDiscrete, StreamingPercentageThe percentage of GPUTime the texture unit is active. This is measured with all extra fetches and any cache or memory effects taken into account.
TexUnitBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles the texture unit is active. This is measured with all extra fetches and any cache or memory effects taken into account.
DepthStencilTestBusyDiscrete, StreamingPercentagePercentage of time GPU spent performing depth and stencil tests relative to GPUBusy.
DepthStencilTestBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles spent performing depth and stencil tests.

VertexGeometry Group

Counter NameSample TypeUsageBrief Description
VsGsVerticesInDiscrete, StreamingItemsThe number of unique vertices processed by the VS and GS.
VsGsPrimsInDiscrete, StreamingItemsThe number of primitives passed into the GS.

PreTessellation Group

Counter NameSample TypeUsageBrief Description
PreTessVerticesInDiscrete, StreamingItemsThe number of unique vertices processed by the VS and HS when using tessellation.

PostTessellation Group

Counter NameSample TypeUsageBrief Description
PostTessPrimsOutDiscrete, StreamingItemsThe number of primitives output by the DS and GS when using tessellation.

PrimitiveAssembly Group

Counter NameSample TypeUsageBrief Description
PrimitivesInDiscrete, StreamingItemsThe number of primitives received by the hardware. This includes primitives generated by tessellation.
CulledPrimsDiscreteItemsThe number of culled primitives. Typical reasons include scissor, the primitive having zero area, and back or front face culling.
ClippedPrimsDiscreteItemsThe number of primitives that required one or more clipping operations due to intersecting the view volume or user clip planes.
PAStalledOnRasterizerDiscrete, StreamingPercentagePercentage of GPUTime that primitive assembly waits for rasterization to be ready to accept data. This roughly indicates for what percentage of time the pipeline is bottlenecked by pixel operations.
PAStalledOnRasterizerCyclesDiscrete, StreamingCyclesNumber of GPU cycles the primitive assembly waits for rasterization to be ready to accept data. Indicates the number of GPU cycles the pipeline is bottlenecked by pixel operations.

PixelShader Group

Counter NameSample TypeUsageBrief Description
PSPixelsOutDiscrete, StreamingItemsPixels exported from shader to color buffers. Does not include killed or alpha tested pixels; if there are multiple render targets, each render target receives one export, so this will be 2 for 1 pixel written to two RTs.
PSExportStallsDiscrete, StreamingPercentagePixel shader output stalls. Percentage of GPUBusy. Should be zero for PS or further upstream limited cases; if not zero, indicates a bottleneck in late Z testing or in the color buffer.
PSExportStallsCyclesDiscrete, StreamingCyclesNumber of GPU cycles the pixel shader output stalls. Should be zero for PS or further upstream limited cases; if not zero, indicates a bottleneck in late Z testing or in the color buffer.

ComputeShader Group

Counter NameSample TypeUsageBrief Description
CSThreadGroupsLaunchedDiscrete, StreamingItemsThe total number of thread groups launched.
CSWavefrontsLaunchedDiscrete, StreamingItemsThe total number of wavefronts launched for the CS.
CSThreadsLaunchedDiscrete, StreamingItemsThe number of CS threads launched and processed by the hardware.
CSThreadGroupSizeDiscreteItemsThe number of CS threads within each thread group.
CSALUStalledByLDSDiscretePercentageThe average percentage of GPUTime each wavefronts’ ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad).
CSALUStalledByLDSCyclesDiscreteCyclesThe average number of GPU cycles each wavefronts’ ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad).
CSLDSBankConflictDiscrete, StreamingPercentageThe average percentage of GPUTime an LDS is stalled due to bank conflicts. Value range: 0% (optimal) to 100% (bad).
CSLDSBankConflictCyclesDiscrete, StreamingCyclesThe average number of GPU cycles an LDS is stalled by bank conflicts. Value range: 0 (optimal) to GPUBusyCycles (bad).
CSALUStalledByLDSPerWaveStreamingPercentageThe average percentage of GPUTime each wavefront’s ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad).

TextureUnit Group

Counter NameSample TypeUsageBrief Description
TexTriFilteringPctDiscrete, StreamingPercentagePercentage of pixels that received trilinear filtering. Note that not all pixels for which trilinear filtering is enabled will receive it (e.g. if the texture is magnified).
TexTriFilteringCountDiscrete, StreamingItemsCount of pixels that received trilinear filtering. Note that not all pixels for which trilinear filtering is enabled will receive it (e.g. if the texture is magnified).
NoTexTriFilteringCountDiscrete, StreamingItemsCount of pixels that did not receive trilinear filtering.
TexVolFilteringPctDiscrete, StreamingPercentagePercentage of pixels that received volume filtering.
TexVolFilteringCountDiscrete, StreamingItemsCount of pixels that received volume filtering.
NoTexVolFilteringCountDiscrete, StreamingItemsCount of pixels that did not receive volume filtering.
TexAveAnisotropyDiscreteItemsThe average degree of anisotropy applied. A number between 1 and 16. The anisotropic filtering algorithm only applies samples where they are required (e.g. there will be no extra anisotropic samples if the view vector is perpendicular to the surface) so this can be much lower than the requested anisotropy.

DepthAndStencil Group

Counter NameSample TypeUsageBrief Description
HiZTilesAcceptedDiscrete, StreamingPercentagePercentage of tiles accepted by HiZ and will be rendered to the depth or color buffers.
HiZTilesAcceptedCountDiscrete, StreamingItemsCount of tiles accepted by HiZ and will be rendered to the depth or color buffers.
HiZTilesRejectedCountDiscrete, StreamingItemsCount of tiles not accepted by HiZ.
PreZTilesDetailCulledDiscrete, StreamingPercentagePercentage of tiles rejected because the associated prim had no contributing area.
PreZTilesDetailCulledCountDiscrete, StreamingItemsCount of tiles rejected because the associated primitive had no contributing area.
PreZTilesDetailSurvivingCountDiscrete, StreamingItemsCount of tiles surviving because the associated primitive had contributing area.
HiZQuadsCulledDiscretePercentagePercentage of quads that did not have to continue on in the pipeline after HiZ. They may be written directly to the depth buffer, or culled completely. Consistently low values here may suggest that the Z-range is not being fully utilized.
HiZQuadsCulledCountDiscreteItemsCount of quads that did not have to continue on in the pipeline after HiZ. They may be written directly to the depth buffer, or culled completely. Consistently low values here may suggest that the Z-range is not being fully utilized.
HiZQuadsAcceptedCountDiscrete, StreamingItemsCount of quads that did continue on in the pipeline after HiZ.
PreZQuadsCulledDiscretePercentagePercentage of quads rejected based on the detailZ and earlyZ tests.
PreZQuadsCulledCountDiscreteItemsCount of quads rejected based on the detailZ and earlyZ tests.
PreZQuadsSurvivingCountDiscrete, StreamingItemsCount of quads surviving detailZ and earlyZ tests.
PostZQuadsDiscretePercentagePercentage of quads for which the pixel shader will run and may be postZ tested.
PostZQuadCountDiscrete, StreamingItemsCount of quads for which the pixel shader will run and may be postZ tested.
PreZSamplesPassingDiscrete, StreamingItemsNumber of samples tested for Z before shading and passed.
PreZSamplesFailingSDiscrete, StreamingItemsNumber of samples tested for Z before shading and failed stencil test.
PreZSamplesFailingZDiscrete, StreamingItemsNumber of samples tested for Z before shading and failed Z test.
PostZSamplesPassingDiscrete, StreamingItemsNumber of samples tested for Z after shading and passed.
PostZSamplesFailingSDiscrete, StreamingItemsNumber of samples tested for Z after shading and failed stencil test.
PostZSamplesFailingZDiscrete, StreamingItemsNumber of samples tested for Z after shading and failed Z test.
ZUnitStalledDiscrete, StreamingPercentageThe percentage of GPUTime the depth buffer spends waiting for the color buffer to be ready to accept data. High figures here indicate a bottleneck in color buffer operations.
ZUnitStalledCyclesDiscrete, StreamingCyclesNumber of GPU cycles the depth buffer spends waiting for the color buffer to be ready to accept data. Larger numbers indicate a bottleneck in color buffer operations.
DBMemReadDiscrete, StreamingBytesNumber of bytes read from the depth buffer.
DBMemWrittenDiscrete, StreamingBytesNumber of bytes written to the depth buffer.

MemoryCache Group

Counter NameSample TypeUsageBrief Description
L0CacheHitDiscrete, StreamingPercentageThe percentage of read requests that hit the data in the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal).
L0CacheRequestCountDiscrete, StreamingItemsThe number of read requests made to the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size.
L0CacheHitCountDiscrete, StreamingItemsThe number of read requests which result in a cache hit from the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size.
L0CacheMissCountDiscrete, StreamingItemsThe number of read requests which result in a cache miss from the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size.
ScalarCacheHitDiscrete, StreamingPercentageThe percentage of read requests made from executing shader code that hit the data in the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size. Value range: 0% (no hit) to 100% (optimal).
ScalarCacheRequestCountDiscrete, StreamingItemsThe number of read requests made from executing shader code to the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size.
ScalarCacheHitCountDiscrete, StreamingItemsThe number of read requests made from executing shader code which result in a cache hit from the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size.
ScalarCacheMissCountDiscrete, StreamingItemsThe number of read requests made from executing shader code which result in a cache miss from the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size.
InstCacheHitDiscrete, StreamingPercentageThe percentage of read requests made that hit the data in the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size. Value range: 0% (no hit) to 100% (optimal).
InstCacheRequestCountDiscrete, StreamingItemsThe number of read requests made to the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size.
InstCacheHitCountDiscrete, StreamingItemsThe number of read requests which result in a cache hit from the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size.
InstCacheMissCountDiscrete, StreamingItemsThe number of read requests which result in a cache miss from the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size.
L1CacheHitDiscrete, StreamingPercentageThe percentage of read or write requests that hit the data in the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal).
L1CacheRequestCountDiscreteItemsThe number of read or write requests made to the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size.
L1CacheHitCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache hit from the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size.
L1CacheMissCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache miss from the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size.
L2CacheHitDiscrete, StreamingPercentageThe percentage of read or write requests that hit the data in the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal).
L2CacheMissDiscrete, StreamingPercentageThe percentage of read or write requests that miss the data in the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. Value range: 0% (optimal) to 100% (all miss).
L2CacheRequestCountDiscrete, StreamingItemsThe number of read or write requests made to the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size.
L2CacheHitCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache hit from the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size.
L2CacheMissCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache miss from the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size.
L0TagConflictReadStalledCyclesDiscrete, StreamingItemsThe number of cycles read operations from the L0 cache are stalled due to tag conflicts.
L0TagConflictWriteStalledCyclesDiscrete, StreamingItemsThe number of cycles write operations to the L0 cache are stalled due to tag conflicts.
L0TagConflictAtomicStalledCyclesDiscrete, StreamingItemsThe number of cycles atomic operations on the L0 cache are stalled due to tag conflicts.

GlobalMemory Group

Counter NameSample TypeUsageBrief Description
FetchSizeDiscrete, StreamingBytesThe total bytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account.
WriteSizeDiscrete, StreamingBytesThe total bytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account.
MemUnitBusyDiscrete, StreamingPercentageThe percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound).
MemUnitBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles the memory unit is active. The result includes the stall time (MemUnitStalledCycles). This is measured with all extra fetches and writes and any cache or memory effects taken into account.
MemUnitStalledDiscrete, StreamingPercentageThe percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad).
MemUnitStalledCyclesDiscrete, StreamingCyclesNumber of GPU cycles the memory unit is stalled.
WriteUnitStalledDiscrete, StreamingPercentageThe percentage of GPUTime the Write unit is stalled. Value range: 0% to 100% (bad).
WriteUnitStalledCyclesDiscrete, StreamingCyclesNumber of GPU cycles the Write unit is stalled.
LocalVidMemBytesDiscreteBytesNumber of bytes read from or written to the Infinity Cache (if available) or local video memory
PcieBytesDiscrete, StreamingBytesNumber of bytes sent and received over the PCIe bus

RayTracing Group

Counter NameSample TypeUsageBrief Description
RayTriTestsDiscrete, StreamingItemsThe number of ray triangle intersection tests.
RayBoxTestsDiscrete, StreamingItemsThe number of ray box intersection tests.
TotalRayTestsDiscrete, StreamingItemsTotal number of ray intersection tests, includes both box and triangle intersections.
RayTestsPerWaveDiscrete, StreamingItemsThe number of ray intersection tests per wave.

WaveDistribution Group

Counter NameSample TypeUsageBrief Description
WaveOccupancyPctStreamingPercentageThe percentage of the maximum wavefront occupancy that is currently being used.

WaveOccupancyLimiters Group

Counter NameSample TypeUsageBrief Description
HSLimitedByVgprStreamingPercentageThe percentage of HS wave scheduling requests that are limited by VGPR availability.
HSLimitedByLdsStreamingPercentageThe percentage of HS wave scheduling requests that are limited by LDS availability.
HSLimitedByScratchStreamingPercentageThe percentage of HS wave scheduling requests that are limited by scratch space availability.
HSLimitedByBarriersStreamingPercentageThe percentage of HS wave scheduling requests that are limited by barriers.
GSLimitedByVgprStreamingPercentageThe percentage of GS wave scheduling requests that are limited by VGPR availability.
GSLimitedByLdsStreamingPercentageThe percentage of GS wave scheduling requests that are limited by LDS availability.
GSLimitedByScratchStreamingPercentageThe percentage of GS wave scheduling requests that are limited by scratch space availability.
PSLimitedByLdsStreamingPercentageThe percentage of PS wave scheduling requests that are limited by LDS availability.
PSLimitedByVgprStreamingPercentageThe percentage of PS wave scheduling requests that are limited by VGPR availability.
PSLimitedByScratchStreamingPercentageThe percentage of PS wave scheduling requests that are limited by scratch space availability.
CSLimitedByLdsStreamingPercentageThe percentage of CS wave scheduling requests that are limited by LDS availability.
CSLimitedByVgprStreamingPercentageThe percentage of CS wave scheduling requests that are limited by VGPR availability.
CSLimitedByScratchStreamingPercentageThe percentage of CS wave scheduling requests that are limited by scratch space availability.
CSLimitedByBarriersStreamingPercentageThe percentage of CS wave scheduling requests that are limited by barriers.
CSLimitedByThreadGroupLimitStreamingPercentageThe percentage of CS wave scheduling requests that are limited by the thread group limit.

Copyright(c) 2018-2025 Advanced Micro Devices, Inc. All rights reserved.Graphics Performance Counters for RDNA2*** Note, this is an auto-generated file. Do not edit. Execute PublicCounterCompiler to rebuild.

RDNA2 Counters

Timing Group

Counter NameSample TypeUsageBrief Description
GPUTimeDiscreteNanosecondsTime this API command took to execute on the GPU in nanoseconds from the time the previous command reached the bottom of the pipeline (BOP) to the time this command reaches the bottom of the pipeline (BOP). Does not include time that draw calls are processed in parallel.
ExecutionDurationDiscreteNanosecondsGPU command execution duration in nanoseconds, from the time the command enters the top of the pipeline (TOP) to the time the command reaches the bottom of the pipeline (BOP). Does not include time that draw calls are processed in parallel.
ExecutionStartDiscreteNanosecondsGPU command execution start time in nanoseconds. This is the time the command enters the top of the pipeline (TOP).
ExecutionEndDiscreteNanosecondsGPU command execution end time in nanoseconds. This is the time the command reaches the bottom of the pipeline (BOP).
GPUBusyDiscrete, StreamingPercentageThe percentage of time the GPU command processor was busy.
GPUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the GPU command processor was busy.
TessellatorBusyDiscrete, StreamingPercentageThe percentage of time the tessellation engine is busy.
TessellatorBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the tessellation engine is busy.
VsGsBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has VS or GS work to do in a VS-[GS-]PS pipeline.
VsGsBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has VS or GS work to do in a VS-[GS-]PS pipeline.
VsGsTimeDiscreteNanosecondsTime VS or GS are busy in nanoseconds in a VS-[GS-]PS pipeline.
PreTessellationBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has VS and HS work to do in a pipeline that uses tessellation.
PreTessellationBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has VS and HS work to do in a pipeline that uses tessellation.
PreTessellationTimeDiscreteNanosecondsTime VS and HS are busy in nanoseconds in a pipeline that uses tessellation.
PostTessellationBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has DS or GS work to do in a pipeline that uses tessellation.
PostTessellationBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has DS or GS work to do in a pipeline that uses tessellation.
PostTessellationTimeDiscreteNanosecondsTime DS or GS are busy in nanoseconds in a pipeline that uses tessellation.
PSBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has pixel shader work to do.
PSBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has pixel shader work to do.
PSTimeDiscreteNanosecondsTime pixel shaders are busy in nanoseconds.
CSBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has compute shader work to do.
CSBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has compute shader work to do.
CSTimeDiscreteNanosecondsTime compute shaders are busy in nanoseconds.
PrimitiveAssemblyBusyDiscretePercentageThe percentage of GPUTime that primitive assembly (clipping and culling) is busy. High values may be caused by having many small primitives; mid to low values may indicate pixel shader or output buffer bottleneck.
PrimitiveAssemblyBusyCyclesDiscreteCyclesNumber of GPU cycles the primitive assembly (clipping and culling) is busy. High values may be caused by having many small primitives; mid to low values may indicate pixel shader or output buffer bottleneck.
TexUnitBusyDiscrete, StreamingPercentageThe percentage of GPUTime the texture unit is active. This is measured with all extra fetches and any cache or memory effects taken into account.
TexUnitBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles the texture unit is active. This is measured with all extra fetches and any cache or memory effects taken into account.
DepthStencilTestBusyDiscrete, StreamingPercentagePercentage of time GPU spent performing depth and stencil tests relative to GPUBusy.
DepthStencilTestBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles spent performing depth and stencil tests.

VertexGeometry Group

Counter NameSample TypeUsageBrief Description
GSVerticesOutDiscrete, StreamingItemsThe number of vertices output by the GS.
VsGsVALUInstCountDiscreteItemsAverage number of vector ALU instructions executed for the VS and GS in a VS-[GS-]PS pipeline. Affected by flow control.
VsGsSALUInstCountDiscreteItemsAverage number of scalar ALU instructions executed for the VS and GS. Affected by flow control.
VsGsVALUBusyDiscretePercentageThe percentage of GPUTime vector ALU instructions are being processed for the VS and GS.
VsGsVALUBusyCyclesDiscreteCyclesNumber of GPU cycles where vector ALU instructions are being processed for the VS and GS.
VsGsSALUBusyDiscretePercentageThe percentage of GPUTime scalar ALU instructions are being processed for the VS and GS.
VsGsSALUBusyCyclesDiscreteCyclesNumber of GPU cycles where scalar ALU instructions are being processed for the VS and GS.

PreTessellation Group

Counter NameSample TypeUsageBrief Description
PreTessVALUInstCountDiscrete, StreamingItemsAverage number of vector ALU instructions executed for the VS and HS in a pipeline that uses tessellation. Affected by flow control.
PreTessSALUInstCountDiscrete, StreamingItemsAverage number of scalar ALU instructions executed for the VS and HS in a pipeline that uses tessellation. Affected by flow control.
PreTessVALUBusyDiscrete, StreamingPercentageThe percentage of GPUTime vector ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation.
PreTessVALUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles vector where ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation.
PreTessSALUBusyDiscrete, StreamingPercentageThe percentage of GPUTime scalar ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation.
PreTessSALUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles where scalar ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation.
PreTessVerticesInDiscrete, StreamingItemsThe number of vertices processed by the VS and HS when using tessellation.

PostTessellation Group

Counter NameSample TypeUsageBrief Description
PostTessPrimsOutDiscrete, StreamingItemsThe number of primitives output by the DS and GS when using tessellation.
PostTessVALUInstCountDiscrete, StreamingItemsAverage number of vector ALU instructions executed for the DS and GS in a pipeline that uses tessellation. Affected by flow control.
PostTessSALUInstCountDiscreteItemsAverage number of scalar ALU instructions executed for the DS and GS in a pipeline that uses tessellation. Affected by flow control.
PostTessVALUBusyDiscrete, StreamingPercentageThe percentage of GPUTime vector ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation.
PostTessVALUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles vector where ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation.
PostTessSALUBusyDiscrete, StreamingPercentageThe percentage of GPUTime scalar ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation.
PostTessSALUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles where scalar ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation.

PrimitiveAssembly Group

Counter NameSample TypeUsageBrief Description
PrimitivesInDiscrete, StreamingItemsThe number of primitives received by the hardware. This includes primitives generated by tessellation.
CulledPrimsDiscreteItemsThe number of culled primitives. Typical reasons include scissor, the primitive having zero area, and back or front face culling.
ClippedPrimsDiscrete, StreamingItemsThe number of primitives that required one or more clipping operations due to intersecting the view volume or user clip planes.
PAStalledOnRasterizerDiscrete, StreamingPercentagePercentage of GPUTime that primitive assembly waits for rasterization to be ready to accept data. This roughly indicates for what percentage of time the pipeline is bottlenecked by pixel operations.
PAStalledOnRasterizerCyclesDiscrete, StreamingCyclesNumber of GPU cycles the primitive assembly waits for rasterization to be ready to accept data. Indicates the number of GPU cycles the pipeline is bottlenecked by pixel operations.

PixelShader Group

Counter NameSample TypeUsageBrief Description
PSPixelsOutDiscrete, StreamingItemsPixels exported from shader to color buffers. Does not include killed or alpha tested pixels; if there are multiple render targets, each render target receives one export, so this will be 2 for 1 pixel written to two RTs.
PSExportStallsDiscrete, StreamingPercentagePixel shader output stalls. Percentage of GPUBusy. Should be zero for PS or further upstream limited cases; if not zero, indicates a bottleneck in late Z testing or in the color buffer.
PSExportStallsCyclesDiscrete, StreamingCyclesNumber of GPU cycles the pixel shader output stalls. Should be zero for PS or further upstream limited cases; if not zero, indicates a bottleneck in late Z testing or in the color buffer.

ComputeShader Group

Counter NameSample TypeUsageBrief Description
CSThreadGroupsLaunchedDiscrete, StreamingItemsTotal number of thread groups launched.
CSWavefrontsLaunchedDiscrete, StreamingItemsThe total number of wavefronts launched for the CS.
CSThreadsLaunchedDiscrete, StreamingItemsThe number of CS threads launched and processed by the hardware.
CSThreadGroupSizeDiscreteItemsThe number of CS threads within each thread group.
CSVALUInstsDiscreteItemsThe average number of vector ALU instructions executed per work-item (affected by flow control).
CSVALUUtilizationDiscretePercentageThe percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of the wave size. Value range: 0% (bad), 100% (ideal - no thread divergence).
CSSALUInstsDiscreteItemsThe average number of scalar ALU instructions executed per work-item (affected by flow control).
CSVFetchInstsDiscreteItemsThe average number of vector fetch instructions from the video memory executed per work-item (affected by flow control).
CSSFetchInstsDiscreteItemsThe average number of scalar fetch instructions from the video memory executed per work-item (affected by flow control).
CSVWriteInstsDiscreteItemsThe average number of vector write instructions to the video memory executed per work-item (affected by flow control).
CSGDSInstsDiscreteItemsThe average number of GDS read or GDS write instructions executed per work item (affected by flow control).
CSLDSInstsDiscreteItemsThe average number of LDS read/write instructions executed per work-item (affected by flow control).
CSALUStalledByLDSDiscretePercentageThe percentage of GPUTime ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad).
CSALUStalledByLDSCyclesDiscreteCyclesThe average number of GPU cycles the each wavefronts’ ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible.
CSLDSBankConflictDiscrete, StreamingPercentageThe percentage of GPUTime LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad).
CSLDSBankConflictCyclesDiscrete, StreamingCyclesNumber of GPU cycles the LDS is stalled by bank conflicts. Value range: 0 (optimal) to GPUBusyCycles (bad).
CSALUStalledByLDSPerWaveStreamingPercentageThe average percentage of GPUTime each wavefront’s ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad).

TextureUnit Group

Counter NameSample TypeUsageBrief Description
TexTriFilteringPctDiscrete, StreamingPercentagePercentage of pixels that received trilinear filtering. Note that not all pixels for which trilinear filtering is enabled will receive it (e.g. if the texture is magnified).
TexTriFilteringCountDiscrete, StreamingItemsCount of pixels that received trilinear filtering. Note that not all pixels for which trilinear filtering is enabled will receive it (e.g. if the texture is magnified).
NoTexTriFilteringCountDiscrete, StreamingItemsCount of pixels that did not receive trilinear filtering.
TexVolFilteringPctDiscrete, StreamingPercentagePercentage of pixels that received volume filtering.
TexVolFilteringCountDiscrete, StreamingItemsCount of pixels that received volume filtering.
NoTexVolFilteringCountDiscrete, StreamingItemsCount of pixels that did not receive volume filtering.
TexAveAnisotropyDiscreteItemsThe average degree of anisotropy applied. A number between 1 and 16. The anisotropic filtering algorithm only applies samples where they are required (e.g. there will be no extra anisotropic samples if the view vector is perpendicular to the surface) so this can be much lower than the requested anisotropy.

DepthAndStencil Group

Counter NameSample TypeUsageBrief Description
HiZTilesAcceptedDiscrete, StreamingPercentagePercentage of tiles accepted by HiZ and will be rendered to the depth or color buffers.
HiZTilesAcceptedCountDiscrete, StreamingItemsCount of tiles accepted by HiZ and will be rendered to the depth or color buffers.
HiZTilesRejectedCountDiscrete, StreamingItemsCount of tiles not accepted by HiZ.
PreZTilesDetailCulledDiscrete, StreamingPercentagePercentage of tiles rejected because the associated prim had no contributing area.
PreZTilesDetailCulledCountDiscrete, StreamingItemsCount of tiles rejected because the associated primitive had no contributing area.
PreZTilesDetailSurvivingCountDiscrete, StreamingItemsCount of tiles surviving because the associated primitive had contributing area.
HiZQuadsCulledDiscretePercentagePercentage of quads that did not have to continue on in the pipeline after HiZ. They may be written directly to the depth buffer, or culled completely. Consistently low values here may suggest that the Z-range is not being fully utilized.
HiZQuadsCulledCountDiscreteItemsCount of quads that did not have to continue on in the pipeline after HiZ. They may be written directly to the depth buffer, or culled completely. Consistently low values here may suggest that the Z-range is not being fully utilized.
HiZQuadsAcceptedCountDiscrete, StreamingItemsCount of quads that did continue on in the pipeline after HiZ.
PreZQuadsCulledDiscretePercentagePercentage of quads rejected based on the detailZ and earlyZ tests.
PreZQuadsCulledCountDiscreteItemsCount of quads rejected based on the detailZ and earlyZ tests.
PreZQuadsSurvivingCountDiscreteItemsCount of quads surviving detailZ and earlyZ tests.
PostZQuadsDiscretePercentagePercentage of quads for which the pixel shader will run and may be postZ tested.
PostZQuadCountDiscrete, StreamingItemsCount of quads for which the pixel shader will run and may be postZ tested.
PreZSamplesPassingDiscrete, StreamingItemsNumber of samples tested for Z before shading and passed.
PreZSamplesFailingSDiscrete, StreamingItemsNumber of samples tested for Z before shading and failed stencil test.
PreZSamplesFailingZDiscrete, StreamingItemsNumber of samples tested for Z before shading and failed Z test.
PostZSamplesPassingDiscrete, StreamingItemsNumber of samples tested for Z after shading and passed.
PostZSamplesFailingSDiscrete, StreamingItemsNumber of samples tested for Z after shading and failed stencil test.
PostZSamplesFailingZDiscrete, StreamingItemsNumber of samples tested for Z after shading and failed Z test.
ZUnitStalledDiscrete, StreamingPercentageThe percentage of GPUTime the depth buffer spends waiting for the color buffer to be ready to accept data. High figures here indicate a bottleneck in color buffer operations.
ZUnitStalledCyclesDiscrete, StreamingCyclesNumber of GPU cycles the depth buffer spends waiting for the color buffer to be ready to accept data. Larger numbers indicate a bottleneck in color buffer operations.
DBMemReadDiscrete, StreamingBytesNumber of bytes read from the depth buffer.
DBMemWrittenDiscrete, StreamingBytesNumber of bytes written to the depth buffer.

ColorBuffer Group

Counter NameSample TypeUsageBrief Description
CBMemReadDiscrete, StreamingBytesNumber of bytes read from the color buffer.
CBColorAndMaskReadDiscrete, StreamingBytesTotal number of bytes read from the color and mask buffers.
CBMemWrittenDiscrete, StreamingBytesNumber of bytes written to the color buffer.
CBColorAndMaskWrittenDiscrete, StreamingBytesTotal number of bytes written to the color and mask buffers.
CBSlowPixelPctDiscrete, StreamingPercentagePercentage of pixels written to the color buffer using a half-rate or quarter-rate format.
CBSlowPixelCountDiscrete, StreamingItemsNumber of pixels written to the color buffer using a half-rate or quarter-rate format.

MemoryCache Group

Counter NameSample TypeUsageBrief Description
L0CacheHitDiscrete, StreamingPercentageThe percentage of read requests that hit the data in the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal).
L0CacheRequestCountDiscrete, StreamingItemsThe number of read requests made to the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size.
L0CacheHitCountDiscrete, StreamingItemsThe number of read requests which result in a cache hit from the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size.
L0CacheMissCountDiscrete, StreamingItemsThe number of read requests which result in a cache miss from the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size.
ScalarCacheHitDiscrete, StreamingPercentageThe percentage of read requests made from executing shader code that hit the data in the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size. Value range: 0% (no hit) to 100% (optimal).
ScalarCacheRequestCountDiscrete, StreamingItemsThe number of read requests made from executing shader code to the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size.
ScalarCacheHitCountDiscrete, StreamingItemsThe number of read requests made from executing shader code which result in a cache hit from the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size.
ScalarCacheMissCountDiscrete, StreamingItemsThe number of read requests made from executing shader code which result in a cache miss from the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size.
InstCacheHitDiscrete, StreamingPercentageThe percentage of read requests made that hit the data in the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size. Value range: 0% (no hit) to 100% (optimal).
InstCacheRequestCountDiscrete, StreamingItemsThe number of read requests made to the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size.
InstCacheHitCountDiscrete, StreamingItemsThe number of read requests which result in a cache hit from the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size.
InstCacheMissCountDiscrete, StreamingItemsThe number of read requests which result in a cache miss from the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size.
L1CacheHitDiscrete, StreamingPercentageThe percentage of read or write requests that hit the data in the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal).
L1CacheRequestCountDiscreteItemsThe number of read or write requests made to the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size.
L1CacheHitCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache hit from the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size.
L1CacheMissCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache miss from the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size.
L2CacheHitDiscrete, StreamingPercentageThe percentage of read or write requests that hit the data in the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal).
L2CacheMissDiscrete, StreamingPercentageThe percentage of read or write requests that miss the data in the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. Value range: 0% (optimal) to 100% (all miss).
L2CacheRequestCountDiscrete, StreamingItemsThe number of read or write requests made to the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size.
L2CacheHitCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache hit from the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size.
L2CacheMissCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache miss from the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size.
L0TagConflictReadStalledCyclesDiscrete, StreamingItemsThe number of cycles read operations from the L0 cache are stalled due to tag conflicts.
L0TagConflictWriteStalledCyclesDiscrete, StreamingItemsThe number of cycles write operations to the L0 cache are stalled due to tag conflicts.
L0TagConflictAtomicStalledCyclesDiscrete, StreamingItemsThe number of cycles atomic operations on the L0 cache are stalled due to tag conflicts.

GlobalMemory Group

Counter NameSample TypeUsageBrief Description
FetchSizeDiscrete, StreamingBytesThe total bytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account.
WriteSizeDiscrete, StreamingBytesThe total bytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account.
MemUnitBusyDiscrete, StreamingPercentageThe percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound).
MemUnitBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles the memory unit is active. The result includes the stall time (MemUnitStalledCycles). This is measured with all extra fetches and writes and any cache or memory effects taken into account.
MemUnitStalledDiscrete, StreamingPercentageThe percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad).
MemUnitStalledCyclesDiscrete, StreamingCyclesNumber of GPU cycles the memory unit is stalled.
WriteUnitStalledDiscrete, StreamingPercentageThe percentage of GPUTime the Write unit is stalled. Value range: 0% to 100% (bad).
WriteUnitStalledCyclesDiscrete, StreamingCyclesNumber of GPU cycles the Write unit is stalled.
LocalVidMemBytesDiscreteBytesNumber of bytes read from or written to the Infinity Cache (if available) or local video memory
PcieBytesDiscreteBytesNumber of bytes sent and received over the PCIe bus

RayTracing Group

Counter NameSample TypeUsageBrief Description
RayTriTestsDiscrete, StreamingItemsThe number of ray triangle intersection tests.
RayBoxTestsDiscrete, StreamingItemsThe number of ray box intersection tests.
TotalRayTestsDiscrete, StreamingItemsTotal number of ray intersection tests, includes both box and triangle intersections.
RayTestsPerWaveDiscrete, StreamingItemsThe number of ray intersection tests per wave.

WaveDistribution Group

Counter NameSample TypeUsageBrief Description
WaveOccupancyPctStreamingPercentageThe percentage of the maximum wavefront occupancy that is currently being used.

WaveOccupancyLimiters Group

Counter NameSample TypeUsageBrief Description
LSHSLimitedByVgprStreamingPercentageThe percentage of LS and HS wave scheduling requests that are limited by VGPR availability.
LSHSLimitedByLdsStreamingPercentageThe percentage of LS and HS wave scheduling requests that are limited by LDS availability.
LSHSLimitedByScratchStreamingPercentageThe percentage of LS and HS wave scheduling requests that are limited by scratch space availability.
HSLimitedByBarriersStreamingPercentageThe percentage of HS wave scheduling requests that are limited by barriers.
ESGSLimitedByVgprStreamingPercentageThe percentage of ES and GS wave scheduling requests that are limited by VGPR availability.
ESGSLimitedByLdsStreamingPercentageThe percentage of ES and GS wave scheduling requests that are limited by LDS availability.
ESGSLimitedByScratchStreamingPercentageThe percentage of ES and GS wave scheduling requests that are limited by scratch space availability.
VSLimitedByVgprStreamingPercentageThe percentage of VS wave scheduling requests that are limited by VGPR availability.
VSLimitedByScratchStreamingPercentageThe percentage of VS wave scheduling requests that are limited by scractch space availability.
VSLimitedByExportStreamingPercentageThe percentage of cycles that VS Waves are stalled due to export space availability.
PSLimitedByLdsStreamingPercentageThe percentage of PS wave scheduling requests that are limited by LDS availability.
PSLimitedByVgprStreamingPercentageThe percentage of PS wave scheduling requests that are limited by VGPR availability.
PSLimitedByScratchStreamingPercentageThe percentage of PS wave scheduling requests that are limited by scratch space availability.
CSLimitedByLdsStreamingPercentageThe percentage of CS wave scheduling requests that are limited by LDS availability.
CSLimitedByVgprStreamingPercentageThe percentage of CS wave scheduling requests that are limited by VGPR availability.
CSLimitedByScratchStreamingPercentageThe percentage of CS wave scheduling requests that are limited by scratch space availability.
CSLimitedByBarriersStreamingPercentageThe percentage of CS wave scheduling requests that are limited by barriers.
CSLimitedByThreadGroupLimitStreamingPercentageThe percentage of CS wave scheduling requests that are limited by the thread group limit.

Copyright(c) 2018-2025 Advanced Micro Devices, Inc. All rights reserved.Graphics Performance Counters for RDNA*** Note, this is an auto-generated file. Do not edit. Execute PublicCounterCompiler to rebuild.

RDNA Counters

Timing Group

Counter NameSample TypeUsageBrief Description
GPUTimeDiscreteNanosecondsTime this API command took to execute on the GPU in nanoseconds from the time the previous command reached the bottom of the pipeline (BOP) to the time this command reaches the bottom of the pipeline (BOP). Does not include time that draw calls are processed in parallel.
ExecutionDurationDiscreteNanosecondsGPU command execution duration in nanoseconds, from the time the command enters the top of the pipeline (TOP) to the time the command reaches the bottom of the pipeline (BOP). Does not include time that draw calls are processed in parallel.
ExecutionStartDiscreteNanosecondsGPU command execution start time in nanoseconds. This is the time the command enters the top of the pipeline (TOP).
ExecutionEndDiscreteNanosecondsGPU command execution end time in nanoseconds. This is the time the command reaches the bottom of the pipeline (BOP).
GPUBusyDiscrete, StreamingPercentageThe percentage of time the GPU command processor was busy.
GPUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the GPU command processor was busy.
TessellatorBusyDiscrete, StreamingPercentageThe percentage of time the tessellation engine is busy.
TessellatorBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the tessellation engine is busy.
VsGsBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has VS or GS work to do in a VS-[GS-]PS pipeline.
VsGsBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has VS or GS work to do in a VS-[GS-]PS pipeline.
VsGsTimeDiscreteNanosecondsTime VS or GS are busy in nanoseconds in a VS-[GS-]PS pipeline.
PreTessellationBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has VS and HS work to do in a pipeline that uses tessellation.
PreTessellationBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has VS and HS work to do in a pipeline that uses tessellation.
PreTessellationTimeDiscreteNanosecondsTime VS and HS are busy in nanoseconds in a pipeline that uses tessellation.
PostTessellationBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has DS or GS work to do in a pipeline that uses tessellation.
PostTessellationBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has DS or GS work to do in a pipeline that uses tessellation.
PostTessellationTimeDiscreteNanosecondsTime DS or GS are busy in nanoseconds in a pipeline that uses tessellation.
PSBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has pixel shader work to do.
PSBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has pixel shader work to do.
PSTimeDiscreteNanosecondsTime pixel shaders are busy in nanoseconds.
CSBusyDiscrete, StreamingPercentageThe percentage of time the ShaderUnit has compute shader work to do.
CSBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles that the ShaderUnit has compute shader work to do.
CSTimeDiscreteNanosecondsTime compute shaders are busy in nanoseconds.
PrimitiveAssemblyBusyDiscretePercentageThe percentage of GPUTime that primitive assembly (clipping and culling) is busy. High values may be caused by having many small primitives; mid to low values may indicate pixel shader or output buffer bottleneck.
PrimitiveAssemblyBusyCyclesDiscreteCyclesNumber of GPU cycles the primitive assembly (clipping and culling) is busy. High values may be caused by having many small primitives; mid to low values may indicate pixel shader or output buffer bottleneck.
TexUnitBusyDiscrete, StreamingPercentageThe percentage of GPUTime the texture unit is active. This is measured with all extra fetches and any cache or memory effects taken into account.
TexUnitBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles the texture unit is active. This is measured with all extra fetches and any cache or memory effects taken into account.
DepthStencilTestBusyDiscrete, StreamingPercentagePercentage of time GPU spent performing depth and stencil tests relative to GPUBusy.
DepthStencilTestBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles spent performing depth and stencil tests.

VertexGeometry Group

Counter NameSample TypeUsageBrief Description
GSVerticesOutDiscrete, StreamingItemsThe number of vertices output by the GS.

PreTessellation Group

Counter NameSample TypeUsageBrief Description
PreTessVerticesInDiscrete, StreamingItemsThe number of vertices processed by the VS and HS when using tessellation.
PreTessVALUInstCountDiscrete, StreamingItemsAverage number of vector ALU instructions executed for the VS and HS in a pipeline that uses tessellation. Affected by flow control.
PreTessSALUInstCountDiscrete, StreamingItemsAverage number of scalar ALU instructions executed for the VS and HS in a pipeline that uses tessellation. Affected by flow control.
PreTessVALUBusyDiscrete, StreamingPercentageThe percentage of GPUTime vector ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation.
PreTessVALUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles vector where ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation.
PreTessSALUBusyDiscrete, StreamingPercentageThe percentage of GPUTime scalar ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation.
PreTessSALUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles where scalar ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation.

PostTessellation Group

Counter NameSample TypeUsageBrief Description
PostTessPrimsOutDiscrete, StreamingItemsThe number of primitives output by the DS and GS when using tessellation.
PostTessVALUInstCountDiscrete, StreamingItemsAverage number of vector ALU instructions executed for the DS and GS in a pipeline that uses tessellation. Affected by flow control.
PostTessSALUInstCountDiscreteItemsAverage number of scalar ALU instructions executed for the DS and GS in a pipeline that uses tessellation. Affected by flow control.
PostTessVALUBusyDiscrete, StreamingPercentageThe percentage of GPUTime vector ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation.
PostTessVALUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles vector where ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation.
PostTessSALUBusyDiscrete, StreamingPercentageThe percentage of GPUTime scalar ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation.
PostTessSALUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles where scalar ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation.

PrimitiveAssembly Group

Counter NameSample TypeUsageBrief Description
PrimitivesInDiscrete, StreamingItemsThe number of primitives received by the hardware. This includes primitives generated by tessellation.
CulledPrimsDiscreteItemsThe number of culled primitives. Typical reasons include scissor, the primitive having zero area, and back or front face culling.
ClippedPrimsDiscrete, StreamingItemsThe number of primitives that required one or more clipping operations due to intersecting the view volume or user clip planes.
PAStalledOnRasterizerDiscrete, StreamingPercentagePercentage of GPUTime that primitive assembly waits for rasterization to be ready to accept data. This roughly indicates for what percentage of time the pipeline is bottlenecked by pixel operations.
PAStalledOnRasterizerCyclesDiscrete, StreamingCyclesNumber of GPU cycles the primitive assembly waits for rasterization to be ready to accept data. Indicates the number of GPU cycles the pipeline is bottlenecked by pixel operations.

PixelShader Group

Counter NameSample TypeUsageBrief Description
PSPixelsOutDiscrete, StreamingItemsPixels exported from shader to color buffers. Does not include killed or alpha tested pixels; if there are multiple render targets, each render target receives one export, so this will be 2 for 1 pixel written to two RTs.
PSExportStallsDiscrete, StreamingPercentagePixel shader output stalls. Percentage of GPUBusy. Should be zero for PS or further upstream limited cases; if not zero, indicates a bottleneck in late Z testing or in the color buffer.
PSExportStallsCyclesDiscrete, StreamingCyclesNumber of GPU cycles the pixel shader output stalls. Should be zero for PS or further upstream limited cases; if not zero, indicates a bottleneck in late Z testing or in the color buffer.

ComputeShader Group

Counter NameSample TypeUsageBrief Description
CSThreadGroupsLaunchedDiscrete, StreamingItemsTotal number of thread groups launched.
CSWavefrontsLaunchedDiscrete, StreamingItemsThe total number of wavefronts launched for the CS.
CSThreadsLaunchedDiscrete, StreamingItemsThe number of CS threads launched and processed by the hardware.
CSThreadGroupSizeDiscreteItemsThe number of CS threads within each thread group.
CSVALUInstsDiscreteItemsThe average number of vector ALU instructions executed per work-item (affected by flow control).
CSVALUUtilizationDiscretePercentageThe percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of the wave size. Value range: 0% (bad), 100% (ideal - no thread divergence).
CSSALUInstsDiscreteItemsThe average number of scalar ALU instructions executed per work-item (affected by flow control).
CSVFetchInstsDiscreteItemsThe average number of vector fetch instructions from the video memory executed per work-item (affected by flow control).
CSSFetchInstsDiscreteItemsThe average number of scalar fetch instructions from the video memory executed per work-item (affected by flow control).
CSVWriteInstsDiscreteItemsThe average number of vector write instructions to the video memory executed per work-item (affected by flow control).
CSVALUBusyDiscrete, StreamingPercentageThe percentage of GPUTime vector ALU instructions are processed. Value range: 0% (bad) to 100% (optimal).
CSVALUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles where vector ALU instructions are processed.
CSSALUBusyDiscrete, StreamingPercentageThe percentage of GPUTime scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal).
CSSALUBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles where scalar ALU instructions are processed.
CSGDSInstsDiscreteItemsThe average number of GDS read or GDS write instructions executed per work item (affected by flow control).
CSLDSInstsDiscreteItemsThe average number of LDS read/write instructions executed per work-item (affected by flow control).
CSALUStalledByLDSDiscretePercentageThe percentage of GPUTime ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad).
CSALUStalledByLDSCyclesDiscreteCyclesNumber of GPU cycles each wavefronts’ ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible.
CSLDSBankConflictDiscrete, StreamingPercentageThe percentage of GPUTime LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad).
CSLDSBankConflictCyclesDiscrete, StreamingCyclesNumber of GPU cycles the LDS is stalled by bank conflicts. Value range: 0 (optimal) to GPUBusyCycles (bad).
CSALUStalledByLDSPerWaveStreamingPercentageThe average percentage of GPUTime each wavefront’s ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad).

TextureUnit Group

Counter NameSample TypeUsageBrief Description
TexTriFilteringPctDiscrete, StreamingPercentagePercentage of pixels that received trilinear filtering. Note that not all pixels for which trilinear filtering is enabled will receive it (e.g. if the texture is magnified).
TexTriFilteringCountDiscrete, StreamingItemsCount of pixels that received trilinear filtering. Note that not all pixels for which trilinear filtering is enabled will receive it (e.g. if the texture is magnified).
NoTexTriFilteringCountDiscrete, StreamingItemsCount of pixels that did not receive trilinear filtering.
TexVolFilteringPctDiscrete, StreamingPercentagePercentage of pixels that received volume filtering.
TexVolFilteringCountDiscrete, StreamingItemsCount of pixels that received volume filtering.
NoTexVolFilteringCountDiscrete, StreamingItemsCount of pixels that did not receive volume filtering.
TexAveAnisotropyDiscreteItemsThe average degree of anisotropy applied. A number between 1 and 16. The anisotropic filtering algorithm only applies samples where they are required (e.g. there will be no extra anisotropic samples if the view vector is perpendicular to the surface) so this can be much lower than the requested anisotropy.

DepthAndStencil Group

Counter NameSample TypeUsageBrief Description
HiZTilesAcceptedDiscrete, StreamingPercentagePercentage of tiles accepted by HiZ and will be rendered to the depth or color buffers.
HiZTilesAcceptedCountDiscrete, StreamingItemsCount of tiles accepted by HiZ and will be rendered to the depth or color buffers.
HiZTilesRejectedCountDiscrete, StreamingItemsCount of tiles not accepted by HiZ.
PreZTilesDetailCulledDiscrete, StreamingPercentagePercentage of tiles rejected because the associated prim had no contributing area.
PreZTilesDetailCulledCountDiscrete, StreamingItemsCount of tiles rejected because the associated primitive had no contributing area.
PreZTilesDetailSurvivingCountDiscrete, StreamingItemsCount of tiles surviving because the associated primitive had contributing area.
HiZQuadsCulledDiscretePercentagePercentage of quads that did not have to continue on in the pipeline after HiZ. They may be written directly to the depth buffer, or culled completely. Consistently low values here may suggest that the Z-range is not being fully utilized.
HiZQuadsCulledCountDiscreteItemsCount of quads that did not have to continue on in the pipeline after HiZ. They may be written directly to the depth buffer, or culled completely. Consistently low values here may suggest that the Z-range is not being fully utilized.
HiZQuadsAcceptedCountDiscrete, StreamingItemsCount of quads that did continue on in the pipeline after HiZ.
PreZQuadsCulledDiscretePercentagePercentage of quads rejected based on the detailZ and earlyZ tests.
PreZQuadsCulledCountDiscreteItemsCount of quads rejected based on the detailZ and earlyZ tests.
PreZQuadsSurvivingCountDiscreteItemsCount of quads surviving detailZ and earlyZ tests.
PostZQuadsDiscretePercentagePercentage of quads for which the pixel shader will run and may be postZ tested.
PostZQuadCountDiscrete, StreamingItemsCount of quads for which the pixel shader will run and may be postZ tested.
PreZSamplesPassingDiscrete, StreamingItemsNumber of samples tested for Z before shading and passed.
PreZSamplesFailingSDiscrete, StreamingItemsNumber of samples tested for Z before shading and failed stencil test.
PreZSamplesFailingZDiscrete, StreamingItemsNumber of samples tested for Z before shading and failed Z test.
PostZSamplesPassingDiscrete, StreamingItemsNumber of samples tested for Z after shading and passed.
PostZSamplesFailingSDiscrete, StreamingItemsNumber of samples tested for Z after shading and failed stencil test.
PostZSamplesFailingZDiscrete, StreamingItemsNumber of samples tested for Z after shading and failed Z test.
ZUnitStalledDiscrete, StreamingPercentageThe percentage of GPUTime the depth buffer spends waiting for the color buffer to be ready to accept data. High figures here indicate a bottleneck in color buffer operations.
ZUnitStalledCyclesDiscrete, StreamingCyclesNumber of GPU cycles the depth buffer spends waiting for the color buffer to be ready to accept data. Larger numbers indicate a bottleneck in color buffer operations.
DBMemReadDiscrete, StreamingBytesNumber of bytes read from the depth buffer.
DBMemWrittenDiscrete, StreamingBytesNumber of bytes written to the depth buffer.

ColorBuffer Group

Counter NameSample TypeUsageBrief Description
CBMemReadDiscrete, StreamingBytesNumber of bytes read from the color buffer.
CBColorAndMaskReadDiscrete, StreamingBytesTotal number of bytes read from the color and mask buffers.
CBMemWrittenDiscrete, StreamingBytesNumber of bytes written to the color buffer.
CBColorAndMaskWrittenDiscrete, StreamingBytesTotal number of bytes written to the color and mask buffers.
CBSlowPixelPctDiscrete, StreamingPercentagePercentage of pixels written to the color buffer using a half-rate or quarter-rate format.
CBSlowPixelCountDiscrete, StreamingItemsNumber of pixels written to the color buffer using a half-rate or quarter-rate format.

MemoryCache Group

Counter NameSample TypeUsageBrief Description
L0CacheHitDiscrete, StreamingPercentageThe percentage of read requests that hit the data in the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal).
L0CacheRequestCountDiscrete, StreamingItemsThe number of read requests made to the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size.
L0CacheHitCountDiscrete, StreamingItemsThe number of read requests which result in a cache hit from the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size.
L0CacheMissCountDiscrete, StreamingItemsThe number of read requests which result in a cache miss from the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size.
ScalarCacheHitDiscrete, StreamingPercentageThe percentage of read requests made from executing shader code that hit the data in the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size. Value range: 0% (no hit) to 100% (optimal).
ScalarCacheRequestCountDiscrete, StreamingItemsThe number of read requests made from executing shader code to the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size.
ScalarCacheHitCountDiscrete, StreamingItemsThe number of read requests made from executing shader code which result in a cache hit from the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size.
ScalarCacheMissCountDiscrete, StreamingItemsThe number of read requests made from executing shader code which result in a cache miss from the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size.
InstCacheHitDiscrete, StreamingPercentageThe percentage of read requests made that hit the data in the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size. Value range: 0% (no hit) to 100% (optimal).
InstCacheRequestCountDiscrete, StreamingItemsThe number of read requests made to the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size.
InstCacheHitCountDiscrete, StreamingItemsThe number of read requests which result in a cache hit from the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size.
InstCacheMissCountDiscrete, StreamingItemsThe number of read requests which result in a cache miss from the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size.
L1CacheHitDiscrete, StreamingPercentageThe percentage of read or write requests that hit the data in the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal).
L1CacheRequestCountDiscrete, StreamingItemsThe number of read or write requests made to the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size.
L1CacheHitCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache hit from the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size.
L1CacheMissCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache miss from the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size.
L2CacheHitDiscrete, StreamingPercentageThe percentage of read or write requests that hit the data in the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal).
L2CacheMissDiscrete, StreamingPercentageThe percentage of read or write requests that miss the data in the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. Value range: 0% (optimal) to 100% (all miss).
L2CacheRequestCountDiscrete, StreamingItemsThe number of read or write requests made to the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size.
L2CacheHitCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache hit from the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size.
L2CacheMissCountDiscrete, StreamingItemsThe number of read or write requests which result in a cache miss from the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size.
L0TagConflictReadStalledCyclesDiscrete, StreamingItemsThe number of cycles read operations from the L0 cache are stalled due to tag conflicts.
L0TagConflictWriteStalledCyclesDiscrete, StreamingItemsThe number of cycles write operations to the L0 cache are stalled due to tag conflicts.
L0TagConflictAtomicStalledCyclesDiscrete, StreamingItemsThe number of cycles atomic operations on the L0 cache are stalled due to tag conflicts.

GlobalMemory Group

Counter NameSample TypeUsageBrief Description
FetchSizeDiscrete, StreamingBytesThe total bytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account.
WriteSizeDiscrete, StreamingBytesThe total bytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account.
MemUnitBusyDiscrete, StreamingPercentageThe percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound).
MemUnitBusyCyclesDiscrete, StreamingCyclesNumber of GPU cycles the memory unit is active. The result includes the stall time (MemUnitStalledCycles). This is measured with all extra fetches and writes and any cache or memory effects taken into account.
MemUnitStalledDiscrete, StreamingPercentageThe percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad).
MemUnitStalledCyclesDiscrete, StreamingCyclesNumber of GPU cycles the memory unit is stalled.
WriteUnitStalledDiscrete, StreamingPercentageThe percentage of GPUTime the Write unit is stalled. Value range: 0% to 100% (bad).
WriteUnitStalledCyclesDiscrete, StreamingCyclesNumber of GPU cycles the Write unit is stalled.
LocalVidMemBytesDiscreteBytesNumber of bytes read from or written to local video memory
PcieBytesDiscreteBytesNumber of bytes sent and received over the PCIe bus

WaveDistribution Group

Counter NameSample TypeUsageBrief Description
WaveOccupancyPctStreamingPercentageThe percentage of the maximum wavefront occupancy that is currently being used.

WaveOccupancyLimiters Group

Counter NameSample TypeUsageBrief Description
LSHSLimitedByVgprStreamingPercentageThe percentage of LS and HS wave scheduling requests that are limited by VGPR availability.
LSHSLimitedByLdsStreamingPercentageThe percentage of LS and HS wave scheduling requests that are limited by LDS availability.
LSHSLimitedByScratchStreamingPercentageThe percentage of LS and HS wave scheduling requests that are limited by scratch space availability.
HSLimitedByBarriersStreamingPercentageThe percentage of HS wave scheduling requests that are limited by barriers.
ESGSLimitedByVgprDiscrete, StreamingPercentageThe percentage of ES and GS wave scheduling requests that are limited by VGPR availability.
ESGSLimitedByLdsStreamingPercentageThe percentage of ES and GS wave scheduling requests that are limited by LDS availability.
ESGSLimitedByScratchStreamingPercentageThe percentage of ES and GS wave scheduling requests that are limited by scratch space availability.
VSLimitedByVgprStreamingPercentageThe percentage of VS wave scheduling requests that are limited by VGPR availability.
VSLimitedByScratchStreamingPercentageThe percentage of VS wave scheduling requests that are limited by scractch space availability.
VSLimitedByExportStreamingPercentageThe percentage of cycles that VS Waves are stalled due to export space availability.
PSLimitedByLdsStreamingPercentageThe percentage of PS wave scheduling requests that are limited by LDS availability.
PSLimitedByVgprStreamingPercentageThe percentage of PS wave scheduling requests that are limited by VGPR availability.
PSLimitedByScratchStreamingPercentageThe percentage of PS wave scheduling requests that are limited by scratch space availability.
CSLimitedByLdsStreamingPercentageThe percentage of CS wave scheduling requests that are limited by LDS availability.
CSLimitedByVgprStreamingPercentageThe percentage of CS wave scheduling requests that are limited by VGPR availability.
CSLimitedByScratchStreamingPercentageThe percentage of CS wave scheduling requests that are limited by scratch space availability.
CSLimitedByBarriersStreamingPercentageThe percentage of CS wave scheduling requests that are limited by barriers.
CSLimitedByThreadGroupLimitStreamingPercentageThe percentage of CS wave scheduling requests that are limited by the thread group limit.