RDNA Counters
Timing Group
Counter Name |
Usage |
Brief Description |
---|---|---|
GPUTime |
Nanoseconds |
Time this API command took to execute on the GPU in nanoseconds from the time the previous command reached the bottom of the pipeline (BOP) to the time this command reaches the bottom of the pipeline (BOP). Does not include time that draw calls are processed in parallel. |
ExecutionDuration |
Nanoseconds |
GPU command execution duration in nanoseconds, from the time the command enters the top of the pipeline (TOP) to the time the command reaches the bottom of the pipeline (BOP). Does not include time that draw calls are processed in parallel. |
ExecutionStart |
Nanoseconds |
GPU command execution start time in nanoseconds. This is the time the command enters the top of the pipeline (TOP). |
ExecutionEnd |
Nanoseconds |
GPU command execution end time in nanoseconds. This is the time the command reaches the bottom of the pipeline (BOP). |
GPUBusy |
Percentage |
The percentage of time the GPU command processor was busy. |
GPUBusyCycles |
Cycles |
Number of GPU cycles that the GPU command processor was busy. |
TessellatorBusy |
Percentage |
The percentage of time the tessellation engine is busy. |
TessellatorBusyCycles |
Cycles |
Number of GPU cycles that the tessellation engine is busy. |
VsGsBusy |
Percentage |
The percentage of time the ShaderUnit has VS or GS work to do in a VS-[GS-]PS pipeline. |
VsGsBusyCycles |
Cycles |
Number of GPU cycles that the ShaderUnit has VS or GS work to do in a VS-[GS-]PS pipeline. |
VsGsTime |
Nanoseconds |
Time VS or GS are busy in nanoseconds in a VS-[GS-]PS pipeline. |
PreTessellationBusy |
Percentage |
The percentage of time the ShaderUnit has VS and HS work to do in a pipeline that uses tessellation. |
PreTessellationBusyCycles |
Cycles |
Number of GPU cycles that the ShaderUnit has VS and HS work to do in a pipeline that uses tessellation. |
PreTessellationTime |
Nanoseconds |
Time VS and HS are busy in nanoseconds in a pipeline that uses tessellation. |
PostTessellationBusy |
Percentage |
The percentage of time the ShaderUnit has DS or GS work to do in a pipeline that uses tessellation. |
PostTessellationBusyCycles |
Cycles |
Number of GPU cycles that the ShaderUnit has DS or GS work to do in a pipeline that uses tessellation. |
PostTessellationTime |
Nanoseconds |
Time DS or GS are busy in nanoseconds in a pipeline that uses tessellation. |
PSBusy |
Percentage |
The percentage of time the ShaderUnit has pixel shader work to do. |
PSBusyCycles |
Cycles |
Number of GPU cycles that the ShaderUnit has pixel shader work to do. |
PSTime |
Nanoseconds |
Time pixel shaders are busy in nanoseconds. |
CSBusy |
Percentage |
The percentage of time the ShaderUnit has compute shader work to do. |
CSBusyCycles |
Cycles |
Number of GPU cycles that the ShaderUnit has compute shader work to do. |
CSTime |
Nanoseconds |
Time compute shaders are busy in nanoseconds. |
PrimitiveAssemblyBusy |
Percentage |
The percentage of GPUTime that primitive assembly (clipping and culling) is busy. High values may be caused by having many small primitives; mid to low values may indicate pixel shader or output buffer bottleneck. |
PrimitiveAssemblyBusyCycles |
Cycles |
Number of GPU cycles the primitive assembly (clipping and culling) is busy. High values may be caused by having many small primitives; mid to low values may indicate pixel shader or output buffer bottleneck. |
TexUnitBusy |
Percentage |
The percentage of GPUTime the texture unit is active. This is measured with all extra fetches and any cache or memory effects taken into account. |
TexUnitBusyCycles |
Cycles |
Number of GPU cycles the texture unit is active. This is measured with all extra fetches and any cache or memory effects taken into account. |
DepthStencilTestBusy |
Percentage |
Percentage of time GPU spent performing depth and stencil tests relative to GPUBusy. |
DepthStencilTestBusyCycles |
Cycles |
Number of GPU cycles spent performing depth and stencil tests. |
VertexGeometry Group
Counter Name |
Usage |
Brief Description |
---|---|---|
GSVerticesOut |
Items |
The number of vertices output by the GS. |
PreTessellation Group
Counter Name |
Usage |
Brief Description |
---|---|---|
PreTessVerticesIn |
Items |
The number of vertices processed by the VS and HS when using tessellation. |
PreTessVALUInstCount |
Items |
Average number of vector ALU instructions executed for the VS and HS in a pipeline that uses tessellation. Affected by flow control. |
PreTessSALUInstCount |
Items |
Average number of scalar ALU instructions executed for the VS and HS in a pipeline that uses tessellation. Affected by flow control. |
PreTessVALUBusy |
Percentage |
The percentage of GPUTime vector ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation. |
PreTessVALUBusyCycles |
Cycles |
Number of GPU cycles vector where ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation. |
PreTessSALUBusy |
Percentage |
The percentage of GPUTime scalar ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation. |
PreTessSALUBusyCycles |
Cycles |
Number of GPU cycles where scalar ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation. |
PostTessellation Group
Counter Name |
Usage |
Brief Description |
---|---|---|
PostTessPrimsOut |
Items |
The number of primitives output by the DS and GS when using tessellation. |
PostTessVALUInstCount |
Items |
Average number of vector ALU instructions executed for the DS and GS in a pipeline that uses tessellation. Affected by flow control. |
PostTessSALUInstCount |
Items |
Average number of scalar ALU instructions executed for the DS and GS in a pipeline that uses tessellation. Affected by flow control. |
PostTessVALUBusy |
Percentage |
The percentage of GPUTime vector ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation. |
PostTessVALUBusyCycles |
Cycles |
Number of GPU cycles vector where ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation. |
PostTessSALUBusy |
Percentage |
The percentage of GPUTime scalar ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation. |
PostTessSALUBusyCycles |
Cycles |
Number of GPU cycles where scalar ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation. |
PrimitiveAssembly Group
Counter Name |
Usage |
Brief Description |
---|---|---|
PrimitivesIn |
Items |
The number of primitives received by the hardware. This includes primitives generated by tessellation. |
CulledPrims |
Items |
The number of culled primitives. Typical reasons include scissor, the primitive having zero area, and back or front face culling. |
ClippedPrims |
Items |
The number of primitives that required one or more clipping operations due to intersecting the view volume or user clip planes. |
PAStalledOnRasterizer |
Percentage |
Percentage of GPUTime that primitive assembly waits for rasterization to be ready to accept data. This roughly indicates for what percentage of time the pipeline is bottlenecked by pixel operations. |
PAStalledOnRasterizerCycles |
Cycles |
Number of GPU cycles the primitive assembly waits for rasterization to be ready to accept data. Indicates the number of GPU cycles the pipeline is bottlenecked by pixel operations. |
PixelShader Group
Counter Name |
Usage |
Brief Description |
---|---|---|
PSPixelsOut |
Items |
Pixels exported from shader to color buffers. Does not include killed or alpha tested pixels; if there are multiple render targets, each render target receives one export, so this will be 2 for 1 pixel written to two RTs. |
PSExportStalls |
Percentage |
Pixel shader output stalls. Percentage of GPUBusy. Should be zero for PS or further upstream limited cases; if not zero, indicates a bottleneck in late Z testing or in the color buffer. |
PSExportStallsCycles |
Cycles |
Number of GPU cycles the pixel shader output stalls. Should be zero for PS or further upstream limited cases; if not zero, indicates a bottleneck in late Z testing or in the color buffer. |
ComputeShader Group
Counter Name |
Usage |
Brief Description |
---|---|---|
CSThreadGroupsLaunched |
Items |
Total number of thread groups launched. |
CSWavefrontsLaunched |
Items |
The total number of wavefronts launched for the CS. |
CSThreadsLaunched |
Items |
The number of CS threads launched and processed by the hardware. |
CSThreadGroupSize |
Items |
The number of CS threads within each thread group. |
CSVALUInsts |
Items |
The average number of vector ALU instructions executed per work-item (affected by flow control). |
CSVALUUtilization |
Percentage |
The percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of the wave size. Value range: 0% (bad), 100% (ideal – no thread divergence). |
CSSALUInsts |
Items |
The average number of scalar ALU instructions executed per work-item (affected by flow control). |
CSVFetchInsts |
Items |
The average number of vector fetch instructions from the video memory executed per work-item (affected by flow control). |
CSSFetchInsts |
Items |
The average number of scalar fetch instructions from the video memory executed per work-item (affected by flow control). |
CSVWriteInsts |
Items |
The average number of vector write instructions to the video memory executed per work-item (affected by flow control). |
CSVALUBusy |
Percentage |
The percentage of GPUTime vector ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). |
CSVALUBusyCycles |
Cycles |
Number of GPU cycles where vector ALU instructions are processed. |
CSSALUBusy |
Percentage |
The percentage of GPUTime scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). |
CSSALUBusyCycles |
Cycles |
Number of GPU cycles where scalar ALU instructions are processed. |
CSGDSInsts |
Items |
The average number of GDS read or GDS write instructions executed per work item (affected by flow control). |
CSLDSInsts |
Items |
The average number of LDS read/write instructions executed per work-item (affected by flow control). |
CSALUStalledByLDS |
Percentage |
The percentage of GPUTime ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad). |
CSALUStalledByLDSCycles |
Cycles |
Number of GPU cycles each wavefronts’ ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. |
CSLDSBankConflict |
Percentage |
The percentage of GPUTime LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad). |
CSLDSBankConflictCycles |
Cycles |
Number of GPU cycles the LDS is stalled by bank conflicts. Value range: 0 (optimal) to GPUBusyCycles (bad). |
TextureUnit Group
Counter Name |
Usage |
Brief Description |
---|---|---|
TexTriFilteringPct |
Percentage |
Percentage of pixels that received trilinear filtering. Note that not all pixels for which trilinear filtering is enabled will receive it (e.g. if the texture is magnified). |
TexTriFilteringCount |
Items |
Count of pixels that received trilinear filtering. Note that not all pixels for which trilinear filtering is enabled will receive it (e.g. if the texture is magnified). |
NoTexTriFilteringCount |
Items |
Count of pixels that did not receive trilinear filtering. |
TexVolFilteringPct |
Percentage |
Percentage of pixels that received volume filtering. |
TexVolFilteringCount |
Items |
Count of pixels that received volume filtering. |
NoTexVolFilteringCount |
Items |
Count of pixels that did not receive volume filtering. |
TexAveAnisotropy |
Items |
The average degree of anisotropy applied. A number between 1 and 16. The anisotropic filtering algorithm only applies samples where they are required (e.g. there will be no extra anisotropic samples if the view vector is perpendicular to the surface) so this can be much lower than the requested anisotropy. |
DepthAndStencil Group
Counter Name |
Usage |
Brief Description |
---|---|---|
HiZTilesAccepted |
Percentage |
Percentage of tiles accepted by HiZ and will be rendered to the depth or color buffers. |
HiZTilesAcceptedCount |
Items |
Count of tiles accepted by HiZ and will be rendered to the depth or color buffers. |
HiZTilesRejectedCount |
Items |
Count of tiles not accepted by HiZ. |
PreZTilesDetailCulled |
Percentage |
Percentage of tiles rejected because the associated prim had no contributing area. |
PreZTilesDetailCulledCount |
Items |
Count of tiles rejected because the associated primitive had no contributing area. |
PreZTilesDetailSurvivingCount |
Items |
Count of tiles surviving because the associated primitive had contributing area. |
HiZQuadsCulled |
Percentage |
Percentage of quads that did not have to continue on in the pipeline after HiZ. They may be written directly to the depth buffer, or culled completely. Consistently low values here may suggest that the Z-range is not being fully utilized. |
HiZQuadsCulledCount |
Items |
Count of quads that did not have to continue on in the pipeline after HiZ. They may be written directly to the depth buffer, or culled completely. Consistently low values here may suggest that the Z-range is not being fully utilized. |
HiZQuadsAcceptedCount |
Items |
Count of quads that did continue on in the pipeline after HiZ. |
PreZQuadsCulled |
Percentage |
Percentage of quads rejected based on the detailZ and earlyZ tests. |
PreZQuadsCulledCount |
Items |
Count of quads rejected based on the detailZ and earlyZ tests. |
PreZQuadsSurvivingCount |
Items |
Count of quads surviving detailZ and earlyZ tests. |
PostZQuads |
Percentage |
Percentage of quads for which the pixel shader will run and may be postZ tested. |
PostZQuadCount |
Items |
Count of quads for which the pixel shader will run and may be postZ tested. |
PreZSamplesPassing |
Items |
Number of samples tested for Z before shading and passed. |
PreZSamplesFailingS |
Items |
Number of samples tested for Z before shading and failed stencil test. |
PreZSamplesFailingZ |
Items |
Number of samples tested for Z before shading and failed Z test. |
PostZSamplesPassing |
Items |
Number of samples tested for Z after shading and passed. |
PostZSamplesFailingS |
Items |
Number of samples tested for Z after shading and failed stencil test. |
PostZSamplesFailingZ |
Items |
Number of samples tested for Z after shading and failed Z test. |
ZUnitStalled |
Percentage |
The percentage of GPUTime the depth buffer spends waiting for the color buffer to be ready to accept data. High figures here indicate a bottleneck in color buffer operations. |
ZUnitStalledCycles |
Cycles |
Number of GPU cycles the depth buffer spends waiting for the color buffer to be ready to accept data. Larger numbers indicate a bottleneck in color buffer operations. |
DBMemRead |
Bytes |
Number of bytes read from the depth buffer. |
DBMemWritten |
Bytes |
Number of bytes written to the depth buffer. |
ColorBuffer Group
Counter Name |
Usage |
Brief Description |
---|---|---|
CBMemRead |
Bytes |
Number of bytes read from the color buffer. |
CBColorAndMaskRead |
Bytes |
Total number of bytes read from the color and mask buffers. |
CBMemWritten |
Bytes |
Number of bytes written to the color buffer. |
CBColorAndMaskWritten |
Bytes |
Total number of bytes written to the color and mask buffers. |
CBSlowPixelPct |
Percentage |
Percentage of pixels written to the color buffer using a half-rate or quarter-rate format. |
CBSlowPixelCount |
Items |
Number of pixels written to the color buffer using a half-rate or quarter-rate format. |
MemoryCache Group
Counter Name |
Usage |
Brief Description |
---|---|---|
L0CacheHit |
Percentage |
The percentage of read requests that hit the data in the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal). |
L0CacheRequestCount |
Items |
The number of read requests made to the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size. |
L0CacheHitCount |
Items |
The number of read requests which result in a cache hit from the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size. |
L0CacheMissCount |
Items |
The number of read requests which result in a cache miss from the L0 cache. The L0 cache contains vector data, which is data that may vary in each thread across the wavefront. Each request is 128 bytes in size. |
ScalarCacheHit |
Percentage |
The percentage of read requests made from executing shader code that hit the data in the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size. Value range: 0% (no hit) to 100% (optimal). |
ScalarCacheRequestCount |
Items |
The number of read requests made from executing shader code to the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size. |
ScalarCacheHitCount |
Items |
The number of read requests made from executing shader code which result in a cache hit from the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size. |
ScalarCacheMissCount |
Items |
The number of read requests made from executing shader code which result in a cache miss from the Scalar cache. The Scalar cache contains data that does not vary in each thread across the wavefront. Each request is 64 bytes in size. |
InstCacheHit |
Percentage |
The percentage of read requests made that hit the data in the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size. Value range: 0% (no hit) to 100% (optimal). |
InstCacheRequestCount |
Items |
The number of read requests made to the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size. |
InstCacheHitCount |
Items |
The number of read requests which result in a cache hit from the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size. |
InstCacheMissCount |
Items |
The number of read requests which result in a cache miss from the Instruction cache. The Instruction cache supplies shader code to an executing shader. Each request is 64 bytes in size. |
L1CacheHit |
Percentage |
The percentage of read or write requests that hit the data in the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal). |
L1CacheRequestCount |
Items |
The number of read or write requests made to the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size. |
L1CacheHitCount |
Items |
The number of read or write requests which result in a cache hit from the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size. |
L1CacheMissCount |
Items |
The number of read or write requests which result in a cache miss from the L1 cache. The L1 cache is shared across all WGPs in a single shader engine. Each request is 128 bytes in size. |
L2CacheHit |
Percentage |
The percentage of read or write requests that hit the data in the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. Value range: 0% (no hit) to 100% (optimal). |
L2CacheMiss |
Percentage |
The percentage of read or write requests that miss the data in the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. Value range: 0% (optimal) to 100% (all miss). |
L2CacheRequestCount |
Items |
The number of read or write requests made to the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. |
L2CacheHitCount |
Items |
The number of read or write requests which result in a cache hit from the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. |
L2CacheMissCount |
Items |
The number of read or write requests which result in a cache miss from the L2 cache. The L2 cache is shared by many blocks across the GPU, including the Command Processor, Geometry Engine, all WGPs, all Render Backends, and others. Each request is 128 bytes in size. |
L0TagConflictReadStalledCycles |
Items |
The number of cycles read operations from the L0 cache are stalled due to tag conflicts. |
L0TagConflictWriteStalledCycles |
Items |
The number of cycles write operations to the L0 cache are stalled due to tag conflicts. |
L0TagConflictAtomicStalledCycles |
Items |
The number of cycles atomic operations on the L0 cache are stalled due to tag conflicts. |
GlobalMemory Group
Counter Name |
Usage |
Brief Description |
---|---|---|
FetchSize |
Bytes |
The total bytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. |
WriteSize |
Bytes |
The total bytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. |
MemUnitBusy |
Percentage |
The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound). |
MemUnitBusyCycles |
Cycles |
Number of GPU cycles the memory unit is active. The result includes the stall time (MemUnitStalledCycles). This is measured with all extra fetches and writes and any cache or memory effects taken into account. |
MemUnitStalled |
Percentage |
The percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad). |
MemUnitStalledCycles |
Cycles |
Number of GPU cycles the memory unit is stalled. |
WriteUnitStalled |
Percentage |
The percentage of GPUTime the Write unit is stalled. Value range: 0% to 100% (bad). |
WriteUnitStalledCycles |
Cycles |
Number of GPU cycles the Write unit is stalled. |
LocalVidMemBytes |
Bytes |
Number of bytes read from or written to local video memory |
PcieBytes |
Bytes |
Number of bytes sent and received over the PCIe bus |