The v1.2 update for GeometryFX introduces cluster culling. Previously, GeometryFX worked on a per-triangle level only. With cluster culling, GeometryFX is able to reject large chunks of the geometry – with corresponding performance increases. Cluster culling is not a new idea – last year at SIGGRAPH, Ubisoft presented a GPU based rendering pipeline which incorporated cluster culling as well.
Before we can dive into the cluster culling path in GeometryFX, some background information is needed. The core loop in GeometryFX works as following: each draw call is split up into small chunks of 256 triangles, which are then packed until a couple of hundred chunks are pending. This batch of chunks is then filtered in one go – going wide on the filtering step is necessary to get good GPU usage. Filtering each chunk individually would result in too many and too small dispatches.
Splitting up draw calls into chunks, and packing them into batches happens on the CPU side. Every time a draw is submitted, the library creates chunks and adds them to the current batch. Once the batch is full, it is sent to the GPU for processing while the library fills up the next batch.
Cluster culling in GeometryFX 1.2 is purely a CPU side operation. While a draw is processed, each chunk is checked and culled if determined to be invisible. This is a brand-new feature of GeometryFX 1.2 – previously, all filtering was done on a per-triangle level on the GPU. With cluster culling, another filtering level is introduced which processes whole chunks of geometry. As the cluster culling happens on the CPU, we need a very fast check, or the additional CPU overhead is not going to provide a benefit.
The idea is to cull entire clusters which are completely back-facing. For this, we need to quickly test if the current camera position is guaranteed to only see backfaces. The basic intuition here is that every triangle splits the world into two half-spaces – the positive and negative one. If you take a point which is inside the union of all negative half spaces of all triangles in a cluster, you’re guaranteed that you can cull the cluster without having to check each individual triangle.
This space may not always exist though. For example opposing triangles have no point from which it is safe to cull both triangles. For most real-world scenes however, clusters of triangles tend to form small surface patches which can be culled together. The main problem is that we haven’t gained anything with this test yet because testing if we’re in the union of all half-spaces is just as expensive as testing all triangles individually.
The key insight here is that we can approximate this space – I’ll call it “safe space” going forward – using some other, simpler geometric primitive. In GeometryFX I’m using a cone, as it works in many real-world situations and is very fast to test against. The only remaining problem now is the creation of that cone.
The algorithm I use is a linear-time pre-process per chunk, which is very fast. It starts by finding one point in the safe space. This happens in two steps. First, we start at the center of the bounding box of the cluster and compute a direction in which to move. This direction is simply the sum of the negative normals of all triangles – remember, the safe space is in the negative half space, so a vector pointing into the negative normal of all triangles is a good starting point. In the second step, the ray formed by the center of the bounding box and this direction is intersected with all triangles, and the maximum distance is taken. This guarantees that the point we found is in the negative half space of all triangles.
With the point and the direction, we only need the cone opening angle next. In the second step, I walk over all triangles and restrict the cone such that the triangle plane does not intersect it. Eventually, I end up with a cone which is inside the “safe space”, and in general, a reasonably good approximation of it.
At run-time, all that is needed is a point-in-cone check, which is super-cheap – a vector subtraction and a dot product is enough. In order to showcase this, there’s also a new option in GeometryFX to generate some sample geometry. This option is enabled by adding “-generate-geometry:true” on the command line. With that, you’ll get a set of wavy patches rendered.
Other content by Matthäus Chajdas
Asynchronous compute can help you to get the maximum GPU usage. I’ll be explaining the details based on the nBodyGravity sample from Microsoft.
GCN hardware supports a special out-of-order rasterization mode which relaxes the ordering guarantee, and allows fragments to be produced out-of-order.
With shader extensions, we provide access to a much better tool to get compaction done: GCN provides a special op-code for compaction within a wavefront.
One of the mandates of GPUOpen is to give developers better access to the hardware, and this post details extensions for Vulkan and Direct3D12 that expose additional GCN features to developers.
Vulkan barriers are unique as they requires you to provide what resources are transitioning and also specify a source and destination pipeline stage.
One thing which is often forgotten is shadow map rendering. As the tessellation level of the terrain is not optimized for the shadow camera, but for the primary camera, this often results in a very strong mismatch and shadow maps end up getting extremely over-tessellated.
mGPU isn’t just for gamers – if you’re a developer working on a game, you should think of using mGPU to make your life easier.
One of our engineers explains a few small code changes that can help you integrate RenderDoc for more unconventional applications.