### Cubemaps

`cubeFaceIndexAMD()`

and `cubeFaceCoordAMD()`

. These can be useful for example when doing image stores to a layered image representing cube faces. Disassembling a simple HLSL shader below provides details on the VALU work.

TextureCube t; SamplerState s;
float4 main(float3 p : TEXCOORD) : SV_Target { return t.Sample(s, p); }

v_cubetc_f32 v1, v2, v3, v0 // v1 = face s coordinate
v_cubesc_f32 v4, v2, v3, v0 // v4 = face t coordinate
v_cubema_f32 v5, v2, v3, v0 // v5 = 2.0 * major axis
v_cubeid_f32 v6, v2, v3, v0 // v6 = face index (0 to 5)
v_rcp_f32 v2, abs(v5) // v2 = 1.0 / abs(2.0 * majorAxis)
s_mov_b32 s0, 0x3fc00000 // s0 = 1.5
v_mad_legacy_f32 v5, v1, v2, s0 // v5 = faceS / abs(2.0 * majorAxis) + 1.5
v_mad_legacy_f32 v4, v4, v2, s0 // v4 = faceT / abs(2.0 * majorAxis) + 1.5
image_sample v[0:3], v[4:7], s[4:11], s[12:15] dmask:0xf

`1.5`

constant is designed such that the output face coordinate (`v4`

and `v5`

in the above example) range is {1.0 <= x < 2.0} which has an advantage in bit encoding compared to {0.0 <= x < 1.0} in that the upper mantissa bits are constant throughout the entire output range.`v_rcp_f32`

counts as 4 ops). When estimating shader cost it is often useful to think in terms of the GPU’s `op:byte:tex`

ratio, where `op`

represents VALU instructions, `byte`

represents bytes of bandwidth, and `tex`

represents simple 2D 32-bit per pixel texture fetch VMEM instructions. Numbers for Fury Nano in giga-units per second are `4096:512:256`

(`op:byte:tex`

), which reduces to the following ratio`16:2:1`

. Note `flop = op * 2`

, as one FMA or MAD is 2 flops.### Octahedron Maps

// 2 temp/return VGPRs
// 2 temp SGPRs (one bool)
// 17 VALU ops
float2 Oct3To2(float3 n) {
float tx,ty;
bool neg;
// project into 2D
tx = abs(n.x) + abs(n.y);
tx = tx + abs(n.z);
tx = rcp(tx); // counts for 4 VALU ops
n.x = n.x * tx;
n.y = n.y * tx;
// unfold if on other half in Z
// n.xy range from {-1.0 to 1.0} to output range {0.0 to 1.0}
tx = 1.0 - abs(n.y);
neg = n.x < 0.0;
tx = neg ? -tx : tx;
ty = 1.0 - abs(n.x);
neg = n.y < 0.0;
ty = neg ? -ty : ty;
neg = n.z <= 0.0;
n.x = neg ? tx : n.x;
n.y = neg ? ty : n.y;
return n.xy; }

`16:1`

(`op:tex`

) ratio, in theory it can be more expensive to generate the coordinates for the octahedral map than to fetch from the texture. Barring the case where the offset wraps over the texture’s edge, the above `Oct3To2() * 0.5 + 0.5`

texture coordinate will just work with 2D texel offsets.

// Check for offset over texture edge,
// 1 temp VGPR
// 2 temp/return SGPRs (one bool)
// 2 VALU ops
bool OctFlipped(float2 r) {
float t = max(abs(r.x), abs(r.y));
return t >= 1.0; }
// Example of computing mirrored repeat sampling
// of an octahedron map with a small texel offset.
// Note this is not designed to solve the double wrap case.
// The "base" is as computed by Oct3To2() above.
float2 coord = base + float2(-2.0, 2.0); // 2 VALU
coord = OctFlipped(coord) ? -coord : coord; // 4 VALU
coord = coord * 0.5 + 0.5; // 2 VALU

# Other posts by Timothy Lottes

### Optimized Reversible Tonemapper for Resolve

Optimized tonemapper form of the technique Brian Karis talks about on Graphics Rants: Tone mapping. Replace the luma computation with max3(red,green,blue).

### Understanding Memory Coalescing on GCN

An explanation of how GCN hardware coalesces memory operations to minimize traffic throughout the memory hierarchy.

### VDR Follow Up – Fine Art of Film Grain

Expanding on Advanced Techniques and Optimization of VDR Color Pipelines: Details on the generation of film grain ideal for transfer functions like sRGB.

### VDR Follow Up – Grain and Fine Details

This post is going to look at very subtle changes to improve grain and fine details using the same 3-bit/channel quantization case from the prior post.

### Using Vulkan® Device Memory

This post serves as a guide on how to best use the various Memory Heaps & Memory Types exposed in Vulkan on AMD drivers, starting with some high-level tips.

### VDR Follow Up – Tonemapping for HDR Signals

Follow up on VDR and practical advice on adapting a game’s tonemapping pipeline to both traditional display signals and new HDR output signals.

### Vulkan® and DOOM

This post takes a look at the interesting bits of helping id Software with their DOOM Vulkan effort, from the perspective of AMD’s Game Engineering Team.