luma
computation with max3(red,green,blue)
. The luma
based tonemapper has variable weighting based on color hue which is not present in the max3
based tonemapper. The max3
based tonemapper removes the hue shift on mixed color edges of similar value.max3()
operation on all versions of GCN maps to a single instruction, v_max3_f32
. The documentation for the GCN instruction set in Fiji, GCN3, is available here. The driver-side AMD DX shader compiler will automatically transform max(x, max(y, z))
into max3(x, y, z)
. This functionality, as well as min3()
and mid3()
, is also exposed explicitly in GLSL via the following extension: AMD_shader_trinary_minmax.
float max3(float x, float y, float z) { return max(x, max(y, z)); }
// Apply this to tonemap linear HDR color "c" after a sample is fetched in the resolve.
// Note "c" 1.0 maps to the expected limit of low-dynamic-range monitor output.
float3 Tonemap(float3 c) { return c * rcp(max3(c.r, c.g, c.b) + 1.0); }
// When the filter kernel is a weighted sum of fetched colors,
// it is more optimal to fold the weighting into the tonemap operation.
float3 TonemapWithWeight(float3 c, float w) { return c * (w * rcp(max3(c.r, c.g, c.b) + 1.0)); }
// Apply this to restore the linear HDR color before writing out the result of the resolve.
float3 TonemapInvert(float3 c) { return c * rcp(1.0 - max3(c.r, c.g, c.b)); }
return TonemapInvert(
TonemapWithWeight(sample0, 0.25) +
TonemapWithWeight(sample1, 0.25) +
TonemapWithWeight(sample2, 0.25) +
TonemapWithWeight(sample3, 0.25));
TonemapInvert()
, which is a random 5-tap horizontal filter.
float max3(float x, float y, float z) { return max(x, max(y, z)); }
float3 TonemapWithWeight(float3 c, float w) { return c * (w * rcp(max3(c.r, c.g, c.b) + 1.0)); }
Texture2D tex0;
SamplerState smp0;
float3 main(float2 pos : TEXCOORD) : SV_Target {
return
TonemapWithWeight(tex0.SampleLevel(smp0, pos, 0, int2(-2,0)), 0.1) +
TonemapWithWeight(tex0.SampleLevel(smp0, pos, 0, int2(-1,0)), 0.2) +
TonemapWithWeight(tex0.SampleLevel(smp0, pos, 0, int2( 0,0)), 0.4) +
TonemapWithWeight(tex0.SampleLevel(smp0, pos, 0, int2( 1,0)), 0.2) +
TonemapWithWeight(tex0.SampleLevel(smp0, pos, 0, int2( 2,0)), 0.1); }
v_max3_f32
v_add_f32
v_rcp_f32 <--- rcp takes 4x the runtime as other VALU (vector ALU) operations
v_mul_f32 <--- folds the scalar filter weight to the tonemap weight before multiply by the color
---------
v_mac_f32 <--- multiply by weight and accumulate with the weighted sum
v_mac_f32 <--- multiply by weight and accumulate with the weighted sum
v_mac_f32 <--- multiply by weight and accumulate with the weighted sum
Other posts by Timothy Lottes

Fetching From Cubes and Octahedrons
For GPU-side dynamically generated data structures which need 3D spherical mappings, two of the most useful mappings are cubemaps and octahedral maps. This post explores the overhead of both mappings.

Understanding Memory Coalescing on GCN
An explanation of how GCN hardware coalesces memory operations to minimize traffic throughout the memory hierarchy.

VDR Follow Up – Fine Art of Film Grain
Expanding on Advanced Techniques and Optimization of VDR Color Pipelines: Details on the generation of film grain ideal for transfer functions like sRGB.

VDR Follow Up – Grain and Fine Details
This post is going to look at very subtle changes to improve grain and fine details using the same 3-bit/channel quantization case from the prior post.

Using Vulkan® Device Memory
This post serves as a guide on how to best use the various Memory Heaps & Memory Types exposed in Vulkan on AMD drivers, starting with some high-level tips.

VDR Follow Up – Tonemapping for HDR Signals
Follow up on VDR and practical advice on adapting a game’s tonemapping pipeline to both traditional display signals and new HDR output signals.

Vulkan® and DOOM
This post takes a look at the interesting bits of helping id Software with their DOOM Vulkan effort, from the perspective of AMD’s Game Engineering Team.