Texel Shading

Originally posted: July 21, 2016

Karl Hillesland

Game engines do most of their shading work per-pixel or per-fragment. But there is another alternative that has been popular in film for decades: object space shading. Pixar’s RenderMan®, the most well-known renderer for Computer Graphics, uses the Reyes rendering method, which is an object space shading method.

This blog looks at an object space shading method that works on Direct3D® 11 class hardware. In particular, we’ll be looking at texture space shading, which uses the texture parameterization of the model. Shading in object space means we have already decoupled the shading rate from pixels, and it’s also easy to decouple in time. We’ve used these decouplings to find ways to improve performance, but we’ll also discuss some future possibilities in this space.

Texture Space Shading

When most people hear the term “texture space shading” they generally think of rasterizing geometry in texture space. There are two difficulties with that kind of approach: visibility and choosing the right resolution. Rasterizing in texture space means you don’t have visibility information from the camera view, and end up shading texels that aren’t visible. Also, your resolution choices are limited, as you can only rasterize at one resolution at a time.

Resolution choice is important because it drives your total shading cost. Each mipmap level costs an additional 4× shading cost. So if you need 512×512 to match pixel rate for one part of an object and 1024×1024 for another part, but you can only pick one, which do you pick? If you pick 512×512, part of the object will be under sampled. If you pick 1024×1024, part of your object will cost 4× as much as needed. That increases with every level you span. If you have an object that spans 4 mipmap levels, for example, you can increase shading cost by up to 64× to hit a particular resolution target for the whole object! The following image shows just a few levels of a single character’s texture access for the given view, (after early depth rejection has been applied), to give you an idea of just how much waste there can be when not selecting the right resolution in texture space shading.

These are the texels from three mipmap levels that are accessed for the given view.

So let’s try a different approach. Instead, we will rasterize in screen space (from the camera view), and instead of shading, we just record the texels we need as shading work. This is why we chose the term “texel shading”. Since we are rasterizing from the camera view, we get the two pieces of information that aren’t normally available in texture space methods – visibility after early depth test, and the screen space derivative for selecting the mipmap level. The work itself are the texels.

How Texel Shading Works

At a high level, the process is as follows:

Render the model, recording the texels required.
Shade the texels, writing them to a texture map of results.
Lookup results while rendering the model a second time.

Note that texel shading can be selectively applied on a per-object, or even per-fragment basis. The fragment-level choice happens at the time of the first geometry pass, so it still incurs additional cost of other stages, but it allows for fall backs to standard forward rendering. It also also allows you to split shading work between texel shading and pixel shading.

There are a few details on how this is done that I will just mention here, but hopefully you have a chance to read the Eurographics short paper on the topic to get a better idea.

One bit of information to know, is that we actually shade 8×8 tiles of texels. We have a sort of cache that has one entry per tile that we use to eliminate redundant tile shades, as well as age tracking for techniques that use shades from previous frames.

The second, is that we need to interpolate vertex attributes in the compute shader. To do so, we have a map we call the “triangle index texture” that we use to identify which triangle we need to shade a particular texel.

Texel shading consists of three stages. Identify texels at the proper mipmap level in a geometry pass, shade the texels, then use them.

The Drawbacks

This alone gives you no benefit. In fact, it incurs additional cost in a few different ways:

Adds an additional geometry pass, although it may be combined with another geometry pass, or combined with the final pass of the previous frame.
Shade count starts higher. In our experiments, it’s 20% to 100% more shades, depending on the model. One cause is that we choose resolutions conservatively to at least match pixel rate. This is also because we actually operate on 8×8 tiles, so some texels are shaded when only a subset of the 8×8 texels are visible and at the right resolution. Note I used the term “start”. That’s because we’ll look at ways to reduce this in the next section.
The overhead of shading in compute. Vertex operations get pushed to per-texel operations. This is why it’s important to consider using preskinned vertices. You must also compute barycentrics and interpolation in shader code yourself. There’s actually untapped potential here as well – with compute, you have explicit access to neighbor information and the possibility of sharing some shading cost across the texel tile.
Memory cost. Our method requires an allocation of enough texture memory to store shaded results and the triangle index texture at a “high enough” resolution. This is an area for future research.
Filtering issues. We only experimented with bilinear-point filtering (picking one resolution, bilinear filter within that mipmap level). This is not necessarily good enough for a final shade. In fact, it may be possible to do much more intelligent filtering that turns this into an advantage. In the end, though, anything done in texture space needs to go through texture filtering, which is far from ideal.

Advantages and Potential

However, once you have an object-space rendered like this, new options open up. I’ll list several of them in this section. The first two, and a bit of the third, are what we’ve tried so far. The rest are what we think is worth trying in the future.

Skip frames. Because the shading occurs in object space, it’s easier to change the temporal shading rate. Our implementation uses a cache of how old shades are, and reuses shades from previous frames, if not too old.
Variable and multi-rate shading. There’s no reason why the mipmap level choice needs to match pixel rate – you can go higher for better shader antialiasing, or lower for much reduced shading rate in lower-variance regions, and this choice can be made on a per-fragment basis. We experimented with biasing the mipmap level on a per-triangle basis by checking the normal variance, and biasing the mipmap level for lighting calculations if the triangle was relatively flat. It’s also possible to actually shade at different rates for the same fragment – in a sense, we did this by only computing lighting in texture space, and doing a texture lookup at fragment rate – but work could have been split up further to compute different parts at different resolutions at the same time.

Left image shows standard (top) vs texel shading (bottom) reusing frames (2) and varying shade rate (2) . Graph shows shade time as a function of shadowed light count for standard shading (blue), texel shading (green), adding temporal shade reuse (red) and adding variable shading rate (cyan).

Decoupled geometry. In a forward renderer, shading efficiency drops quickly with small triangles, and gets worse with Multisampling Anti-Aliasing (MSAA) when samples in a pixel are covered by more than one triangle. Texel shading suffers from neither of these problems, because the shading rate is tied to texel shading rate. We found cases in which texel shading actually outperformed standard forward rendering, before we even got to skipping frames or changing shading rate. But there’s actually more opportunity here. The geometry used for shading can actually be completely different from the triangles that are rasterized in screen space. They can also be different at different mipmap levels. This provides additional opportunity for geometric anti-aliasing that we have not explored.
Object space, temporal anti-aliasing. Rather than skipping frames to save shading cost (point 2 above), one could instead shade every frame, but average in shades from previous frames as is done in screen space temporal anti-aliasing. Performing this operation in object space avoids some artifacts and workarounds that are necessary for screen-space temporal anti-aliasing.
Stochastic effects. One attractive feature of object space shading, is that you can reuse shades while applying depth of field or motion blur, without worrying about screenspace artifacts. In fact, this was a major motivation for the Reyes rendering system. There are also methods of finding the right mipmap bias to save shading when rendering a blurred object. This is an item we have not explored.
Better filtering. Shading in object space, in compute, where we have access to neighboring texels, and in a sort of multi-resolution space, opens up further opportunity for shader filtering. We have not explored this as much as we would like.
Asyncronous shading and lighting. This was actually the original motivation for this research. Skipping frames is a simpler first step. However, if you have some fall-back system for shades that are not ready in time, you could potentially compute shading completely asynchronously, while still updating the projection of the model on the screen at full rate, using whatever the most recent shade value. This is another area we would like to explore.
Stereo and other multi-view scenarios. It’s also possible to reuse shading calculations across eyes for VR, with the caveat that many specular effects may not work well. With asynchronous shading or lighting, you could even share shading across a network of individual users looking at the same scene, such as a multi-player game.

Integration into Existing Engines

We’re not the first to try something like this. The Nitrous engine, for example, uses a texture space shading technique that is suitable for the Real Time Strategy (RTS) style games they target. They don’t have to worry so much about objects spanning mipmap levels, or occluded texels, and so they took a different approach. I encourage you to take a look at Dan Baker’s GDC presentation. Texel shading fits more naturally into a forward rendering pass than deferred (as far as I’ve thought about it – although you could also do deferred texture space shading). It requires that the object has a unique texture parameterization, as is often done for a lightmap. It also requires space for the texture of shaded results, and a choice as to how high the resolution needs to be. This depends on how the texture is to be used – not all the shading needs to be done in texture space. If there are textures that do not use the unique parameterization, additional derivatives are required for mipmap level lookups. This basically fills the function of the screen space derivatives in standard mipmapping, but instead trying to match the rate of the second texture with the first. However, since the mapping from one UV set to the other tends to stay fixed, it can potentially be precomputed. As far as engine integration, it would require the ability to bind the index and vertex buffer of the object during a compute shader pass, preferably a vertex buffer that is already skinned for performance reasons. Alternatively, you could pre-interpolate everything that doesn’t change dynamically, but that costs more memory and isn’t something we’ve tried. Adopting texel shading doesn’t require an all-or-nothing choice. You could apply it to a single object. You could even make a per-fragment choice within that object.

Conclusion

I’ve described a way to do object space shading with Direct3D 11 hardware. It’s a texture space approach, but uses a camera space rasterization to help with occlusion and getting the right mipmap level choices for a given view. We’ve explored some ways to reduce shader load by decoupling shading rate both spatially and temporally, but there’s a lot more to explore. You can get a bit more detail from the Eurographics 2016 short paper.

Karl Hillesland

Karl Hillesland leads AMD's Effects Team. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.