GeometryFX 1.2 – Cluster Culling

The v1.2 update for GeometryFX introduces cluster culling. Previously, GeometryFX worked on a per-triangle level only. With cluster culling, GeometryFX is able to reject large chunks of the geometry – with corresponding performance increases. Cluster culling is not a new idea – last year at SIGGRAPH, Ubisoft presented a GPU based rendering pipeline which incorporated cluster culling as well.

Background

Before we can dive into the cluster culling path in GeometryFX, some background information is needed. The core loop in GeometryFX works as following: each draw call is split up into small chunks of 256 triangles, which are then packed until a couple of hundred chunks are pending. This batch of chunks is then filtered in one go – going wide on the filtering step is necessary to get good GPU usage. Filtering each chunk individually would result in too many and too small dispatches.

Splitting up draw calls into chunks, and packing them into batches happens on the CPU side. Every time a draw is submitted, the library creates chunks and adds them to the current batch. Once the batch is full, it is sent to the GPU for processing while the library fills up the next batch.

Cluster Culling

Cluster culling in GeometryFX 1.2 is purely a CPU side operation. While a draw is processed, each chunk is checked and culled if determined to be invisible. This is a brand-new feature of GeometryFX 1.2 – previously, all filtering was done on a per-triangle level on the GPU. With cluster culling, another filtering level is introduced which processes whole chunks of geometry. As the cluster culling happens on the CPU, we need a very fast check, or the additional CPU overhead is not going to provide a benefit.

The idea is to cull entire clusters which are completely back-facing. For this, we need to quickly test if the current camera position is guaranteed to only see backfaces. The basic intuition here is that every triangle splits the world into two half-spaces – the positive and negative one. If you take a point which is inside the union of all negative half spaces of all triangles in a cluster, you’re guaranteed that you can cull the cluster without having to check each individual triangle.

This space may not always exist though. For example opposing triangles have no point from which it is safe to cull both triangles. For most real-world scenes however, clusters of triangles tend to form small surface patches which can be culled together. The main problem is that we haven’t gained anything with this test yet because testing if we’re in the union of all half-spaces is just as expensive as testing all triangles individually.

The key insight here is that we can approximate this space – I’ll call it “safe space” going forward – using some other, simpler geometric primitive. In GeometryFX I’m using a cone, as it works in many real-world situations and is very fast to test against. The only remaining problem now is the creation of that cone.

geometry-fx

The algorithm I use is a linear-time pre-process per chunk, which is very fast. It starts by finding one point in the safe space. This happens in two steps. First, we start at the center of the bounding box of the cluster and compute a direction in which to move. This direction is simply the sum of the negative normals of all triangles – remember, the safe space is in the negative half space, so a vector pointing into the negative normal of all triangles is a good starting point. In the second step, the ray formed by the center of the bounding box and this direction is intersected with all triangles, and the maximum distance is taken. This guarantees that the point we found is in the negative half space of all triangles.

With the point and the direction, we only need the cone opening angle next. In the second step, I walk over all triangles and restrict the cone such that the triangle plane does not intersect it. Eventually, I end up with a cone which is inside the “safe space”, and in general, a reasonably good approximation of it.

At run-time, all that is needed is a point-in-cone check, which is super-cheap – a vector subtraction and a dot product is enough. In order to showcase this, there’s also a new option in GeometryFX to generate some sample geometry. This option is enabled by adding “-generate-geometry:true” on the command line. With that, you’ll get a set of wavy patches rendered.

geofx-1.2

 
With backface cluster culling turned on, this will remove roughly 30% of the geometry in the view above, with no visual difference, but with a 30% performance uplift. The CPU overhead due to the cluster culling is not measurable, so you can keep it always on – it will never slow down things, only not make them faster.
geofx-1.2-30_culled
Besides the cluster culling, this update also incorporates a pull request from Intel which improves the handling of primitives culling the near plane. For more information, visit the GeometryFX page. That’s it for today!

Other content by Matthäus Chajdas

Matthäus Chajdas

Matthäus Chajdas

Matthäus Chajdas is a developer technology engineer at AMD. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

Enjoy this blog post? If you found it useful, why not share it with other game developers?

You may also like...

Getting started: our software

New or fairly new to AMD’s tools, libraries, and effects? This is the best place to get started on GPUOpen!

Getting started: development and performance

Looking for tips on getting started with developing and/or optimizing your game, whether on AMD hardware or generally? We’ve got you covered!

If slide decks are what you’re after, you’ll find 100+ of our finest presentations here. Plus there’s a handy list of our product manuals!

Developer guides

Browse our developer guides, and find valuable advice on developing with AMD hardware, ray tracing, Vulkan, DirectX, UE4, and lots more.

Words not enough? How about pictures? How about moving pictures? We have some amazing videos to share with you!

The home of great performance and optimization advice for AMD RDNA™ 2 GPUs, AMD Ryzen™ CPUs, and so much more.

Product Blogs

Our handy product blogs will help you make good use of our tools, SDKs, and effects, as well as sharing the latest features with new releases.

Publications

Discover our published publications.