Full-speed, out-of-order rasterization

If you’re familiar with graphics APIs, you’re certainly aware of the API ordering guarantees. At their core, these guarantees mean that if you put two triangles into the pipeline one after the other, they will also end up in the framebuffer in exactly the same order. This makes it possible, for instance, to sort transparent geometry by depth and get the correct blending.

While this guarantee is usually necessary for correctness, it’s often an unnecessary constraint. If you’re laying down a G-Buffer without blending, for example, you typically don’t care about a specific rasterization order. The same commonly applies to depth-only rendering operations. For those cases, GCN hardware supports a special “out-of-order” rasterization mode which does exactly what the name implies: it relaxes the ordering guarantee, and allows fragments to be produced out-of-order. This can improve efficiency in various cases, and in fact, the driver will try to enable it automatically when it is safe to do so.

Comparison of strict and relaxed ordering, showing that the only possible difference is in case of overlapping primitives. In those case, there is no tie-braker rule when relaxed order is in effect.
Comparison of strict and relaxed order rendering in two cases. On the left hand side, you can see two overlapping primitives being rasterized – the arrow indicates the view direction. With relaxed ordering, the output can vary depending on how the hardware decides to process the triangles. On the right-hand side, in case of non-overlapping primitives, the order is well defined in both cases.

However, there are some cases when forcing Out of Order Rasterization at the driver level is not safe. For instance, if you’re rendering with a less-or-equal depth test. In this case, out-of-order rendering will produce different results, as any geometry which is Z-fighting in the less-or-equal case is no longer guaranteed to produce the same results. Because you probably don’t care about the specific pattern of your Z-buffer artifacts (or you know that your scene doesn’t produce them) enabling out-of-order rasterization manually is fine – but it’s a case where the driver can’t do it.

How to use it

Today, we’re introducing a new Vulkan extension, VK_AMD_rasterization_order which allows you to control out-of-order rendering on a per-draw-call basis. It’s a new rasterization state, which you can turn on for everything that does not require strict primitive ordering. This will be generally every G-Buffer pass, all shadow map rendering, and passes that enable commutative blending. In those cases, you can turn on RELAXED order.

There’s no downside to using RELAXED – performance will be the same or better as in the STRICT ordering mode. You also keep the benefits of triangle order optimization tools like AMD Tootle, as the out-of-order execution is still somewhat following the original input order. 

To actually use it in Vulkan, first you need to enable it when creating the device – remember that in Vulkan, only extensions that have been enabled during device creation can be actually used. Once that is done, it hooks into the normal Vulkan extension mechanism. As this is a rasterizer state, you set the pNext field of a VkPipelineRasterizationStateCreateInfo and link it to the structure specified in the extension. For example:


VkPipelineRasterizationStateRasterizationOrderAMD orderAMD = {};
orderAMD.sType = VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_RASTERIZATION_ORDER_AMD;
orderAMD.rasterizationOrder = VK_RASTERIZATION_ORDER_RELAXED_AMD;

VkPipelineRasterizationStateCreateInfo rasterizationStateCreateInfo;
rasterizationStateCreateInfo.pNext = &orderAMD;
What kind of performance can you expect? We’ve seen increases in the 10% range. You won’t see any benefit if the driver had enabled it automatically of course (for instance, depth-only rendering). In nearly all other cases, the driver has to play safe and cannot enable it even though there wouldn’t be any visible artifacts. With this extension, we enable the application to decide on whether relaxed order rendering is sufficient and reap the performance benefits.

In order to use the new extension, you need the Crimson 16.5.2 driver or later – the extension is already implemented and exposed. You can find the extension documentation here; the extension itself is very simple and consists of a new rasterization state. If you want to try it out right away, we also got you covered with our out-of-order rasterization sample application.

Sample

Other content by Matthäus Chajdas