Cubemaps
cubeFaceIndexAMD()
and cubeFaceCoordAMD()
. These can be useful for example when doing image stores to a layered image representing cube faces. Disassembling a simple HLSL shader below provides details on the VALU work.
TextureCube t; SamplerState s;
float4 main(float3 p : TEXCOORD) : SV_Target { return t.Sample(s, p); }
v_cubetc_f32 v1, v2, v3, v0 // v1 = face s coordinate
v_cubesc_f32 v4, v2, v3, v0 // v4 = face t coordinate
v_cubema_f32 v5, v2, v3, v0 // v5 = 2.0 * major axis
v_cubeid_f32 v6, v2, v3, v0 // v6 = face index (0 to 5)
v_rcp_f32 v2, abs(v5) // v2 = 1.0 / abs(2.0 * majorAxis)
s_mov_b32 s0, 0x3fc00000 // s0 = 1.5
v_mad_legacy_f32 v5, v1, v2, s0 // v5 = faceS / abs(2.0 * majorAxis) + 1.5
v_mad_legacy_f32 v4, v4, v2, s0 // v4 = faceT / abs(2.0 * majorAxis) + 1.5
image_sample v[0:3], v[4:7], s[4:11], s[12:15] dmask:0xf
1.5
constant is designed such that the output face coordinate (v4
and v5
in the above example) range is {1.0 <= x < 2.0} which has an advantage in bit encoding compared to {0.0 <= x < 1.0} in that the upper mantissa bits are constant throughout the entire output range.v_rcp_f32
counts as 4 ops). When estimating shader cost it is often useful to think in terms of the GPU’s op:byte:tex
ratio, where op
represents VALU instructions, byte
represents bytes of bandwidth, and tex
represents simple 2D 32-bit per pixel texture fetch VMEM instructions. Numbers for Fury Nano in giga-units per second are 4096:512:256
(op:byte:tex
), which reduces to the following ratio16:2:1
. Note flop = op * 2
, as one FMA or MAD is 2 flops.Octahedron Maps
// 2 temp/return VGPRs
// 2 temp SGPRs (one bool)
// 17 VALU ops
float2 Oct3To2(float3 n) {
float tx,ty;
bool neg;
// project into 2D
tx = abs(n.x) + abs(n.y);
tx = tx + abs(n.z);
tx = rcp(tx); // counts for 4 VALU ops
n.x = n.x * tx;
n.y = n.y * tx;
// unfold if on other half in Z
// n.xy range from {-1.0 to 1.0} to output range {0.0 to 1.0}
tx = 1.0 - abs(n.y);
neg = n.x < 0.0;
tx = neg ? -tx : tx;
ty = 1.0 - abs(n.x);
neg = n.y < 0.0;
ty = neg ? -ty : ty;
neg = n.z <= 0.0;
n.x = neg ? tx : n.x;
n.y = neg ? ty : n.y;
return n.xy; }
16:1
(op:tex
) ratio, in theory it can be more expensive to generate the coordinates for the octahedral map than to fetch from the texture. Barring the case where the offset wraps over the texture’s edge, the above Oct3To2() * 0.5 + 0.5
texture coordinate will just work with 2D texel offsets.
// Check for offset over texture edge,
// 1 temp VGPR
// 2 temp/return SGPRs (one bool)
// 2 VALU ops
bool OctFlipped(float2 r) {
float t = max(abs(r.x), abs(r.y));
return t >= 1.0; }
// Example of computing mirrored repeat sampling
// of an octahedron map with a small texel offset.
// Note this is not designed to solve the double wrap case.
// The "base" is as computed by Oct3To2() above.
float2 coord = base + float2(-2.0, 2.0); // 2 VALU
coord = OctFlipped(coord) ? -coord : coord; // 4 VALU
coord = coord * 0.5 + 0.5; // 2 VALU
Other posts by Timothy Lottes

Optimized Reversible Tonemapper for Resolve
Optimized tonemapper form of the technique Brian Karis talks about on Graphics Rants: Tone mapping. Replace the luma computation with max3(red,green,blue).

Understanding Memory Coalescing on GCN
An explanation of how GCN hardware coalesces memory operations to minimize traffic throughout the memory hierarchy.

VDR Follow Up – Fine Art of Film Grain
Expanding on Advanced Techniques and Optimization of VDR Color Pipelines: Details on the generation of film grain ideal for transfer functions like sRGB.

VDR Follow Up – Grain and Fine Details
This post is going to look at very subtle changes to improve grain and fine details using the same 3-bit/channel quantization case from the prior post.

Using Vulkan® Device Memory
This post serves as a guide on how to best use the various Memory Heaps & Memory Types exposed in Vulkan on AMD drivers, starting with some high-level tips.

VDR Follow Up – Tonemapping for HDR Signals
Follow up on VDR and practical advice on adapting a game’s tonemapping pipeline to both traditional display signals and new HDR output signals.

Vulkan® and DOOM
This post takes a look at the interesting bits of helping id Software with their DOOM Vulkan effort, from the perspective of AMD’s Game Engineering Team.