We are releasing TressFX 3.1. Our biggest update in this release is a new order-independent transparency (OIT) option we call “ShortCut”. We’ve also addressed some of the issues brought up by the community.
ShortCut
ShortCut is our new Order Independent Transparency (OIT) option. It’s inspired by the method presented by Eidos-Montréal and Hybrid Transparency. Whereas our original method focused on the front k = 8 or so layers of hair, ShortCut is good for cases when you can get away with k = 2 or 3, and you’re more concerned about memory usage.
It does require some forethought on how to build your models, however, as it comes with different performance characteristics, and a quality trade-off. But between the simpler memory bounds and the potential for higher performance, we expect it to be a popular choice.
The four main steps are outlined below.
- Render hair geometry, using a sequence of
InterlockedMin
calls to update the list of k nearest fragments while computing an overall alpha. - Screen space pass that puts the kth nearest depth in the depth buffer for early z culling in the next step.
- Render hair geometry again. Shade the fragment and write or blend the color (depending on variant).
[earlydepthstencil]
focuses shading cost on the front k. - Screen space pass that does the final blending.
With the original method, you needed to allocate a single memory pool that is large enough for all hair fragments, not just the front k. With ShortCut, you only need space for the front k layers: 4 bytes for each depth, 4 bytes for each color, and 4 bytes for an accumulated alpha term for each pixel. Another difference with our previous method, is that although you still get the performance advantage of only shading the front k layers, you don’t need to store the shader inputs in screen space.
ShortCut’s main drawback is in extra geometry passes. But it’s still a performance win when the depth complexity is high relative to the geometry cost, as it is for the “ponytail” model that’s included. It also doesn’t give quite the same quality result as the Per-Pixel Linked List (PPLL) method. However, as long as you are aware of these trade-offs, you should be able to create content that works well within these constraints.
We’ve also included some additional compile time choices. The default version uses k = 3. One compile-time switch changes this to k = 2. There’s also a compile-time switch for a non-deterministic mode, which can benefit performance in some configurations, so we wanted to provide this option as well.
Working with the GitHub Community
We’ve received some terrific feedback from the community. This update also addresses some of the issues they have brought up.
One issue was caused by skipping simulation steps when the frame rate exceeds 60 Hz. This caused fur on the bear to sometimes separate from the underlying mesh, since the animation would move forward, but not the simulation. This issue was identified by mrgreywater, who went on to also provide a fix for us! In this release, we’re providing another alternative that’s also a little easier on performance with the ShortCut method.
We’re also looking at some changes to the library structure and API to enable easier engine integration. We’ve received some terrific feedback from the community on these issues (13, 14). These changes are still in the works, but we invite more input.