Optimizing GPU occupancy and resource usage with large thread groups
When using a compute shader, it is important to consider the impact of thread group size on performance. Limited register space, memory latency and SIMD occupancy each affect shader performance in different ways. This article discusses potential performance issues, and techniques and optimizations that can dramatically increase performance if correctly applied.









Good to see dithering finally getting some attention. I’ve been using it in my video renderer (madVR) for more than 7 years now. FWIW, your random noise image looks much worse than it should/could. Here’s how proper TPDF dithering (using a white noise texture) looks like at 3bit:
http://madshi.net/LottesGrainTPDF.png
Of course noise level is still pretty high. If you prefer lower noise levels, try high-quality ordered dithering like this:
http://madshi.net/LottesGrainOrdered.png