Part 2 Extending AMD FidelityFX Brixelizer GI

Originally posted: July 17, 2024

Last updated: July 22, 2024

Petter Blomkvist

Part 2: Extending AMD FidelityFX Brixelizer GI

To improve the quality of foliage in Brixelizer GI, we must modify the algorithm to support alpha-clipped geometry. In essence, we want to alter the Brixelizer GI probe trace to sample the alpha of each surface hit point. This alpha value is then used to determine if the surface is transparent. If deemed transparent, we can perform an additional trace using the surface hit point as the ray origin. This process can be repeated until we encounter an opaque surface containing the final radiance.

Alpha cache

As mentioned in Part 1, Brixelizer does not provide an API for sampling the material parameters of a surface intersection. Therefore, we will need to implement another secondary cache to store the alpha value of geometry. This cache functions in a sparse nature similar to that of the Radiance Cache. At each Brick ID, we allocate a 4x4x4 grid of values that will provide higher detail transparency of the Brick surface.

Alpha buffer

As with the Radiance Cache, we also need a way to fill in the new Alpha Cache. The naive solution here is to render an additional Alpha GBuffer that exports the alpha property of each surface. Then, we can use screen-space voxelization like we do the Radiance Cache to emit this buffer into the scene. Unfortunately, this solution will not be adequate since the geometry of alpha-tested cards often obscures other cards — as is the case for the grass. The below image demonstrates this; the two obscured cards will not have the majority of their alpha values exported.

Three grass cards overlap each other, causing the alpha of the overlapped cards to not be calculated

A couple of solutions exist to remedy this; one way would be to hook up the Alpha Cache directly to the Alpha Buffer pass and use unrestricted memory accesses to store the alpha of each surface. This solution would require a relatively significant change in the Brixelizer GI API; as such, we instead opt for something simpler. To capture surfaces at each depth level, we inject a geometry shader into the Alpha Buffer pass, which randomizes the Z-value of each triangle primitive. Due to us performing screen-space voxelization each frame, these randomized Alpha Buffers have proved adequate for filling alpha at a high enough speed.

Each frame, we randomize Z-values resulting in all alpha being captured over time

For the same reasons as the Radiance Cache, we also need a way to reproduce the world-space coordinates of the Alpha Buffer fragments. One way to do this is to repeat what we do in the Radiance Cache and reconstruct coordinates from the depth buffer and perspective-view matrix. Since our Alpha Buffer depth will consist of randomized values, we must write the original depth to a different output channel.

Another way of reconstructing world-space coordinates is to write the coordinates of each fragment directly to three different channels of the Alpha Buffer. This solution will make reconstruction as simple as reading the coordinates of each fragment.

Additionally, writing world-space coordinates directly to the Alpha Buffer means that Brixelizer GI does not need to know how the fragments were projected. This makes it trivial to use different perspective-view transforms than those used in the other GBuffer passes. Leveraging this simplicity allows us to easily capture out-of-view alpha information, such as leaves located above the camera that need to let light through from the sky. To do this, we render a random axis-oriented direction from the camera every other frame. We render random directions only for every other frame because we want a higher-frequency Alpha Cache in the view space. The process of rendering out-of-view directions is demonstrated in the below image, where the screen-space frustum will be rendered on the first, third, and fifth frames, while the other out-of-view frustums will be rendered on even frames.

Each frame we render a different out-of-view prespective.

We need to add one more special case to the Alpha Buffer. In the final GI, it’s important that non-alpha-tested geometry remains fully opaque. Otherwise, we might have leaking when foliage intersects with such objects. We write a special value to the alpha channel if a fragment belongs to such an object. This will allow us to treat such fragments differently in the Alpha Cache insertion pass.

Below follows some pseudo code for calculating the final value of an Alpha Buffer fragment.

<code><span class="n">output</span><span class="p">.</span><span class="n">rgb</span> <span class="o">=</span> <span class="n">input</span><span class="p">.</span><span class="n">origPosition</span><span class="p">;</span> <span class="c1">// Write world-space coordinates</span>
<span class="k">if</span><span class="p">(</span><span class="n">blendMode</span> <span class="o">==</span> <span class="n">Opaque</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// If fragment belongs to an opaque object</span>
    <span class="n">output</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">OPAQUE_FRAGMENT</span><span class="p">;</span> <span class="c1">// Write special value for opaque geometry</span>
<span class="p">}</span><span class="k">else</span><span class="p">{</span>
    <span class="n">output</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">input</span><span class="p">.</span><span class="n">alpha</span><span class="p">;</span> <span class="c1">// Otherwise, just write alpha</span>
<span class="p">}</span>
</code>

Alpha cache insertion

We insert fragments into the Alpha Cache in two steps. First, we emit non-opaque fragments; the alpha value of these fragments is blended into the Alpha Cache at the reconstructed world-space coordinate. The blend factor is proportional to the amount of alpha fragments that have already been stored at the location in the cache; this will result in quick rough estimates being written initially, which are then averaged over time. The stored samples counter at the cache location will also be incremented up to a max value of 64, which will result in the lowest amount of blending. Each alpha fragment is emitted into the cache three times, jittered along all axes.

In the second step, we insert opaque fragments into the cache. These are jittered less to avoid bleeding onto alpha, and we also do not use any blending. Opaque fragments will overwrite the entire alpha value at their location in the cache. Additionally, they will set the stored samples counter to an above-max value of 128, which will make it harder for alpha fragments to bleed onto it.

To ensure that a stored sample counter with a value of 128 can return to 64, we decrement it (instead of incrementing) when storing alpha values and the counter is above 64.

Pseudo code for the Alpha Cache insertion is shown below.

<code><span class="c1">// Alpha fragments</span>
<span class="k">for</span><span class="p">(</span><span class="n">fragment</span> <span class="n">in</span> <span class="n">fragments</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">ws_coord</span> <span class="o">=</span> <span class="n">fragment</span><span class="p">.</span><span class="n">xyz</span><span class="p">;</span>
    <span class="k">if</span><span class="p">(</span><span class="n">fragment</span><span class="p">.</span><span class="n">a</span> <span class="o">!=</span> <span class="n">OPAQUE_FRAGMENT</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">for</span><span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">3</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">jws_coord</span> <span class="o">=</span> <span class="n">jitter</span><span class="p">(</span><span class="n">ws_coord</span><span class="p">);</span>
            <span class="n">blendFactor</span> <span class="o">=</span> <span class="mi">1</span><span class="o">/</span><span class="n">alphaCache</span><span class="p">[</span><span class="n">jws_coord</span><span class="p">].</span><span class="n">g</span><span class="p">;</span>
            <span class="n">alphaCache</span><span class="p">[</span><span class="n">jws_coord</span><span class="p">].</span><span class="n">r</span> <span class="o">=</span> <span class="n">lerp</span><span class="p">(</span><span class="n">alphaCache</span><span class="p">[</span><span class="n">jws_coord</span><span class="p">],</span> <span class="n">fragment</span><span class="p">.</span><span class="n">a</span><span class="p">,</span><span class="n">blendFactor</span><span class="p">);</span> <span class="c1">// Blend alpha value</span>
            <span class="k">if</span><span class="p">(</span><span class="n">alphaCache</span><span class="p">[</span><span class="n">jws_coord</span><span class="p">].</span><span class="n">g</span> <span class="o"><</span> <span class="mi">64</span><span class="p">)</span>
                <span class="n">alphaCache</span><span class="p">[</span><span class="n">jws_coord</span><span class="p">].</span><span class="n">g</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// Increment stored samples</span>
            <span class="k">else</span> <span class="k">if</span><span class="p">(</span><span class="n">alphaCache</span><span class="p">[</span><span class="n">jws_coord</span><span class="p">].</span><span class="n">g</span> <span class="o">></span> <span class="mi">64</span><span class="p">)</span>
                <span class="n">alphaCache</span><span class="p">[</span><span class="n">jws_Coord</span><span class="p">].</span><span class="n">g</span> <span class="o">-=</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// Decrement stored samples</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Opaque fragments</span>
<span class="k">for</span><span class="p">(</span><span class="n">fragment</span> <span class="n">in</span> <span class="n">fragments</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">ws_coord</span> <span class="o">=</span> <span class="n">fragment</span><span class="p">.</span><span class="n">xyz</span><span class="p">;</span>
    <span class="k">if</span><span class="p">(</span><span class="n">fragment</span><span class="p">.</span><span class="n">a</span> <span class="o">==</span> <span class="n">OPAQUE_FRAGMENT</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">alphaCache</span><span class="p">[</span><span class="n">ws_coord</span><span class="p">].</span><span class="n">r</span> <span class="o">=</span> <span class="mf">1.0f</span><span class="p">;</span> <span class="c1">// Fully opaque</span>
        <span class="n">alphaCache</span><span class="p">[</span><span class="n">ws_coord</span><span class="p">].</span><span class="n">g</span> <span class="o">=</span> <span class="mi">128</span><span class="p">;</span> <span class="c1">// Set stored samples</span>
    <span class="p">}</span>
<span class="p">}</span>
</code>

Tracing the cache

Finally, we need to implement a new trace function for Brixelizer GI that takes advantage of the new Alpha Cache. After a Brixelizer trace, we determine whether a hit surface is transparent stochastically. We fetch the alpha value stored at the location from the Alpha Cache; if it’s less than a random value generated between 0 and 1, it’s deemed transparent. This stochasticity will allow for surface areas that contain equal parts opaque and transparent fragments to be traced through 50% of the time, providing a sort of semi-transparency.

If a surface hit point is deemed transparent, we issue an additional Brixelizer trace with its origin on the other side of the surface. When this new trace hits a surface, we repeat the process of determining its transparency and re-tracing. The process is repeated a maximum of 16 times, after which any hit surface will be deemed opaque, no matter the stored alpha.

When an opaque surface is finally reached, the ray hit payload of that surface intersection will be returned. The pseudo code for this trace is shown below.

<code><span class="n">func</span> <span class="nf">new_trace</span><span class="p">(</span><span class="n">ray_description</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">ray_hit</span><span class="p">;</span>
    <span class="k">for</span><span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">16</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">ray_hit</span> <span class="o">=</span> <span class="n">brixelizer_trace</span><span class="p">(</span><span class="n">ray_description</span><span class="p">);</span> <span class="c1">// Perform Brixelizer trace</span>
        <span class="n">alpha</span> <span class="o">=</span> <span class="n">alphaCache</span><span class="p">[</span><span class="n">ray_hit</span><span class="p">.</span><span class="n">pos</span><span class="p">].</span><span class="n">r</span><span class="p">;</span>
        <span class="k">if</span><span class="p">(</span><span class="n">alpha</span> <span class="o">>=</span> <span class="n">random</span><span class="p">())</span> <span class="p">{</span> <span class="c1">// If surface is opaque</span>
            <span class="k">return</span> <span class="n">ray_hit</span><span class="p">;</span> <span class="c1">// Return ray hit</span>
        <span class="p">}</span>
        <span class="n">ray_description</span><span class="p">.</span><span class="n">origin</span> <span class="o">=</span> <span class="n">ray_hit</span><span class="p">.</span><span class="n">pos</span> <span class="o">+</span> <span class="n">ray_description</span><span class="p">.</span><span class="n">dir</span> <span class="o">*</span> <span class="n">EPS</span><span class="p">;</span> <span class="c1">// Move the next ray origin to the other side of the surface.</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">ray_hit</span><span class="p">;</span>
<span class="p">}</span>
</code>

Using this new trace function, let us now look at what the probes see. The image below shows what the probes see before and after implementing the new trace. Left is before, and right is after.

What the probes see with the new foliage trace

The trees have a lot more contour, and the grass becomes slightly see-through as the alpha of the strands is averaged out. This all results in light being able to pass through the grass to a certain depth, which allows the grass to sample the sky. This can be seen in the final GI result below.

The final foliage GI

The overbrightening observed with the naive solution is no longer present. We can still see a lot of detail in the shadowed area close to the camera, but the area under the trees becomes much darker since the leaves now provide indirect shadowing. We can even make out some ambient occlusion at the roots of the grass near the camera.

Animated foliage

To finish out this blog series, let us talk about the challenge of implementing animated foliage in Brixelizer GI. If you recall, the Radiance Cache and now the Alpha Cache are both associated with surfaces using the Brixelizer-provided Brick ID. Unfortunately, since Brixelizer handles animated geometry by re-allocating associated bricks, the cache can not accumulate on such surfaces.

For the Radiance Cache, this is not a large issue, but since we are not generating alpha fragments for screen-space geometry each frame, half of the frames, the Alpha Cache will be non-existent for such geometry. We want to average the alpha of dynamic geometry over multiple frames, but to do this, we need a new data structure for the Alpha Cache.

Spatial hash maps are a data structure that coherently associates voxels in world space with the same memory location. To access a cell in the hash map, we quantize world-space coordinates to acquire the containing voxels’ coordinates. Then, we hash these coordinates using a one-dimensional hash function, and we use the resulting hash to index the data structure.

There exist many ways to optimize spatial hash-maps to improve memory coherency and avoid collisions. For this experiment, we forgo all that. Below, you can see a comparison of what the probes see when capturing a rotating fan blade. The fan blade consists of an alpha-clipped card. The left video uses the standard Brick ID-based data structure, while the right video uses a spatial hash map. As can be observed, using a spatial hash map, the cache can now accumulate across frames, providing approximate transparency.

What probes see using two different Alpha Cache structures applied to an animated propeller

Here is the final GI result.

The final animated propeller GI

What’s next?

If you would like to find out more about BrixelizerGI, take a look at: Introducing AMD FidelityFX™ Brixelizer.

Petter Blomkvist

Petter is a Master's student who has completed a year-long internship at AMD's European Game Engineering Team. Throughout this time, he conducted extensive research on Software-Based GI and assisted in the shipment of Brixelizer™ and Brixelizer GI™.