| GeometryFX 1.2 – Cluster Culling

The v1.2 update for GeometryFX introduces cluster culling. Previously, GeometryFX worked on a per-triangle level only. With cluster culling, GeometryFX is able to reject large chunks of the geometry – with corresponding performance increases. Cluster culling is not a new idea – last year at SIGGRAPH, Ubisoft presented a GPU based rendering pipeline which incorporated cluster culling as well.

Background

Before we can dive into the cluster culling path in GeometryFX, some background information is needed. The core loop in GeometryFX works as following: each draw call is split up into small chunks of 256 triangles, which are then packed until a couple of hundred chunks are pending. This batch of chunks is then filtered in one go – going wide on the filtering step is necessary to get good GPU usage. Filtering each chunk individually would result in too many and too small dispatches.

Splitting up draw calls into chunks, and packing them into batches happens on the CPU side. Every time a draw is submitted, the library creates chunks and adds them to the current batch. Once the batch is full, it is sent to the GPU for processing while the library fills up the next batch.

Cluster Culling

Cluster culling in GeometryFX 1.2 is purely a CPU side operation. While a draw is processed, each chunk is checked and culled if determined to be invisible. This is a brand-new feature of GeometryFX 1.2 – previously, all filtering was done on a per-triangle level on the GPU. With cluster culling, another filtering level is introduced which processes whole chunks of geometry. As the cluster culling happens on the CPU, we need a very fast check, or the additional CPU overhead is not going to provide a benefit.

The idea is to cull entire clusters which are completely back-facing. For this, we need to quickly test if the current camera position is guaranteed to only see backfaces. The basic intuition here is that every triangle splits the world into two half-spaces – the positive and negative one. If you take a point which is inside the union of all negative half spaces of all triangles in a cluster, you’re guaranteed that you can cull the cluster without having to check each individual triangle.

This space may not always exist though. For example opposing triangles have no point from which it is safe to cull both triangles. For most real-world scenes however, clusters of triangles tend to form small surface patches which can be culled together. The main problem is that we haven’t gained anything with this test yet because testing if we’re in the union of all half-spaces is just as expensive as testing all triangles individually.

The key insight here is that we can approximate this space – I’ll call it “safe space” going forward – using some other, simpler geometric primitive. In GeometryFX I’m using a cone, as it works in many real-world situations and is very fast to test against. The only remaining problem now is the creation of that cone.

geometry-fx

The algorithm I use is a linear-time pre-process per chunk, which is very fast. It starts by finding one point in the safe space. This happens in two steps. First, we start at the center of the bounding box of the cluster and compute a direction in which to move. This direction is simply the sum of the negative normals of all triangles – remember, the safe space is in the negative half space, so a vector pointing into the negative normal of all triangles is a good starting point. In the second step, the ray formed by the center of the bounding box and this direction is intersected with all triangles, and the maximum distance is taken. This guarantees that the point we found is in the negative half space of all triangles.

With the point and the direction, we only need the cone opening angle next. In the second step, I walk over all triangles and restrict the cone such that the triangle plane does not intersect it. Eventually, I end up with a cone which is inside the “safe space”, and in general, a reasonably good approximation of it.

At run-time, all that is needed is a point-in-cone check, which is super-cheap – a vector subtraction and a dot product is enough. In order to showcase this, there’s also a new option in GeometryFX to generate some sample geometry. This option is enabled by adding “-generate-geometry:true” on the command line. With that, you’ll get a set of wavy patches rendered.

geofx-1.2

 
With backface cluster culling turned on, this will remove roughly 30% of the geometry in the view above, with no visual difference, but with a 30% performance uplift. The CPU overhead due to the cluster culling is not measurable, so you can keep it always on – it will never slow down things, only not make them faster.
geofx-1.2-30_culled
Besides the cluster culling, this update also incorporates a pull request from Intel which improves the handling of primitives culling the near plane. For more information, visit the GeometryFX page. That’s it for today!

| OTHER POSTS BY MATTHAEUS CHAJDAS

Fast compaction with mbcnt

With shader extensions, we provide access to a much better tool to get compaction done: GCN provides a special op-code for compaction within a wavefront.

GCN Shader Extensions for Direct3D and Vulkan

One of the mandates of GPUOpen is to give developers better access to the hardware, and this post details extensions for Vulkan and Direct3D12 that expose additional GCN features to developers.

Vulkan barriers explained

Vulkan barriers are unique as they requires you to provide what resources are transitioning and also specify a source and destination pipeline stage.

Optimizing Terrain Shadows

One thing which is often forgotten is shadow map rendering. As the tessellation level of the terrain is not optimized for the shadow camera, but for the primary camera, this often results in a very strong mismatch and shadow maps end up getting extremely over-tessellated.

Matthaeus Chajdas
Matthäus Chajdas is a developer technology engineer at AMD. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

| YOU MAY ALSO LIKE...

Tutorials Library

Browse all our fantastic tutorials, including programming techniques, performance improvements, guest blogs, and how to use our tools.

Samples Library

Browse all our useful samples. Perfect for when you’re needing to get started, want to integrate one of our libraries, and much more.