The v1.2 update for GeometryFX introduces cluster culling. Previously, GeometryFX worked on a per-triangle level only. With cluster culling, GeometryFX is able to reject large chunks of the geometry – with corresponding performance increases. Cluster culling is not a new idea – last year at SIGGRAPH, Ubisoft presented a GPU based rendering pipeline which incorporated cluster culling as well.

Background

Before we can dive into the cluster culling path in GeometryFX, some background information is needed. The core loop in GeometryFX works as following: each draw call is split up into small chunks of 256 triangles, which are then packed until a couple of hundred chunks are pending. This batch of chunks is then filtered in one go – going wide on the filtering step is necessary to get good GPU usage. Filtering each chunk individually would result in too many and too small dispatches.

Splitting up draw calls into chunks, and packing them into batches happens on the CPU side. Every time a draw is submitted, the library creates chunks and adds them to the current batch. Once the batch is full, it is sent to the GPU for processing while the library fills up the next batch.

Cluster Culling

Cluster culling in GeometryFX 1.2 is purely a CPU side operation. While a draw is processed, each chunk is checked and culled if determined to be invisible. This is a brand-new feature of GeometryFX 1.2 – previously, all filtering was done on a per-triangle level on the GPU. With cluster culling, another filtering level is introduced which processes whole chunks of geometry. As the cluster culling happens on the CPU, we need a very fast check, or the additional CPU overhead is not going to provide a benefit.

The idea is to cull entire clusters which are completely back-facing. For this, we need to quickly test if the current camera position is guaranteed to only see backfaces. The basic intuition here is that every triangle splits the world into two half-spaces – the positive and negative one. If you take a point which is inside the union of all negative half spaces of all triangles in a cluster, you’re guaranteed that you can cull the cluster without having to check each individual triangle.

This space may not always exist though. For example opposing triangles have no point from which it is safe to cull both triangles. For most real-world scenes however, clusters of triangles tend to form small surface patches which can be culled together. The main problem is that we haven’t gained anything with this test yet because testing if we’re in the union of all half-spaces is just as expensive as testing all triangles individually.

The key insight here is that we can approximate this space – I’ll call it “safe space” going forward – using some other, simpler geometric primitive. In GeometryFX I’m using a cone, as it works in many real-world situations and is very fast to test against. The only remaining problem now is the creation of that cone.

geometry-fx

The algorithm I use is a linear-time pre-process per chunk, which is very fast. It starts by finding one point in the safe space. This happens in two steps. First, we start at the center of the bounding box of the cluster and compute a direction in which to move. This direction is simply the sum of the negative normals of all triangles – remember, the safe space is in the negative half space, so a vector pointing into the negative normal of all triangles is a good starting point. In the second step, the ray formed by the center of the bounding box and this direction is intersected with all triangles, and the maximum distance is taken. This guarantees that the point we found is in the negative half space of all triangles.

With the point and the direction, we only need the cone opening angle next. In the second step, I walk over all triangles and restrict the cone such that the triangle plane does not intersect it. Eventually, I end up with a cone which is inside the “safe space”, and in general, a reasonably good approximation of it.

At run-time, all that is needed is a point-in-cone check, which is super-cheap – a vector subtraction and a dot product is enough. In order to showcase this, there’s also a new option in GeometryFX to generate some sample geometry. This option is enabled by adding “-generate-geometry:true” on the command line. With that, you’ll get a set of wavy patches rendered.

geofx-1.2

 
With backface cluster culling turned on, this will remove roughly 30% of the geometry in the view above, with no visual difference, but with a 30% performance uplift. The CPU overhead due to the cluster culling is not measurable, so you can keep it always on – it will never slow down things, only not make them faster.
geofx-1.2-30_culled
Besides the cluster culling, this update also incorporates a pull request from Intel which improves the handling of primitives culling the near plane. For more information, visit the GeometryFX page. That’s it for today!

Other content by Matthäus Chajdas