Introducing HIP RT v2.1 - batch construction for small geometries, transformation query functions, and more

Originally posted: October 27, 2023

Last updated: June 26, 2024

Daniel Meister

We are thrilled to announce the release of HIP RT v2.1. In this blog, we discuss new functionality and corresponding API changes.

Batch construction

In the previous versions, individual bottom-level geometries are constructed one by one, which might be inefficient for a large number of small geometries. We introduce batch construction for small geometries that allows us to build many small geometries efficiently in a single kernel launch. We added the following functions:

hiprtError hiprtCreateGeometries(...);
hiprtError hiprtDestroyGeometries(...);
hiprtError hiprtBuildGeometries(...);
hiprtError hiprtGetGeometriesBuildTemporaryBufferSize(...);

These functions do the same operation as the corresponding single-geometry variants, processing multiple geometries at once. For example, hiprtCreateGeometries takes multiple build inputs, creating multiple geometries efficiently using a single malloc call. Similarly, hiprtDestroyGeometries destroys multiple geometries at once. For the construction itself, hiprtBuildGeometries takes multiple build inputs and builds multiple geometries at once. Small geometries (up to 512 geometric primitives) are constructed in one kernel launch, while larger geometries are processed one by one using a specified quality build. Note that HIP RT internally separates small and large geometries, and thus a user does not need to do so explicitly. The maximum size of small geometries (i.e., geometries with primitives less or equal to this value are processed by the batch construction) can be specified in the build options:

struct hiprtBuildOptions
{
    hiprtBuildFlags buildFlags;
    u32 batchBuildMaxPrimCount;
};

If batchBuildMaxPrimCount == 0, the batch construction is disabled, and all geometries are processed sequentially. A caveat is that the batch construction internally uses a modified version of the fast build, which may have a slightly negative impact on the quality of the acceleration structure and ray tracing performance. Nonetheless, we believe that the negative impact is rather negligible as the geometries are very small.

Global and dynamic stacks

The global stack efficiently combines shared memory and global memory. While the shared buffer allocation is relatively straightforward, determining the size of the global buffer is rather complicated. We decided to change the API to make the allocation more user-friendly. We introduce two new structures representing both buffer types:

struct hiprtGlobalStackBuffer
{
    u32 stackSize;
    u32 stackCount;
    void* stackData;
};

struct hiprtSharedStackBuffer
{
    u32 stackSize;
    void* stackData;
};

Both structures encapsulate the buffer address and stack size. The global buffer stack has additionally the stack count, defining how many stacks we need (typically one per scheduled thread). The global buffer can be created/destroyed via the following functions:

hiprtError hiprtCreateGlobalStackBuffer(hiprtContext context, const hiprtGlobalStackBufferInput& input, hiprtGlobalStackBuffer* stackBufferOut);

hiprtError hiprtDestroyGlobalStackBuffer(hiprtContext context, hiprtGlobalStackBuffer stackBuffer);

struct hiprtGlobalStackBufferInput
{
    hiprtStackType type = hiprtStackTypeGlobal;
    u32 stackSize;
    u32 threadCount;
};

Besides the type (that we discuss below), we define just the stack size and the number of scheduled threads. With both allocated buffers, we can finally create a stack object:

hiprtGlobalStackBuffer globalStackBuffer = ...;
hiprtSharedStackBuffer sharedStackBuffer = ...;
hiprtGlobalStack stack(globalStackBuffer, sharedStackBuffer);

The global stack buffer contains stacks for all scheduled threads, which might be wasteful as only a fraction of the threads run is being executed concurrently. We introduce the dynamic stack that allocates stacks only for active threads and dynamically assigns the stacks to the threads on demand. HIP RT internally handles the whole process in the stack constructor. The dynamic stack is created in the same manner as the global stack; we need to change the type in hiprtGlobalStackBufferInput to hiprtStackTypeDynamic (we do not need to set threadCount):

hiprtDynamicStack stack(globalStackBuffer, sharedStackBuffer);

Naturally, this brings some additional overhead, slightly increasing the register usage. We provide the dynamic stack as an option for systems with limited memory.

Transformation query functions

For some shading calculations, we need a transformation from/to object space. We could store these transformations explicitly in a separate buffer, but this is wasteful as hiprtScene already contains the transformation data. We provide the functions that allow to query these transformations from the scene object:

hiprtFrameSRT hiprtGetObjectToWorldFrameSRT(hiprtScene scene, u32 instanceID, float time);
hiprtFrameSRT hiprtGetWorldToObjectFrameSRT(hiprtScene scene, u32 instanceID, float time);
hiprtFrameMatrix hiprtGetObjectToWorldFrameMatrix(hiprtScene scene, u32 instanceID, float time);
hiprtFrameMatrix hiprtGetWorldToObjectFrameMatrix(hiprtScene scene, u32 instanceID, float time);

Notice that the functions take the time parameter. This is especially handy for motion blur as HIP RT internally correctly interpolates transformations.

Other features

We optimized the radix sort in Orochi. This improves the construction speed of the fast and balanced builds (both builds rely on radix sort).
We use tighter boxes for transformed instances in the top-level scene object, leading to higher ray tracing performance.
We fixed the geometry IO functions (hiprtSaveGeometry and hiprtLoadGeometry). Note that the scene IO functions are still not functional.
We added an option to enable/disable caching of the compiled trace kernels in hiprtBuildTraceKernels and hiprtBuildTraceKernelsFromBitcode.

Download it today

The download link for HIP RT v2.1 is available on the HIP RT page.

If you’re looking for some guidance on getting started with HIPRT, check out the HIP RT SDK tutorials repository and the HIP RT documentation page.

Daniel Meister

Daniel Meister is a researcher and software engineer at AMD. His research interests include real-time ray tracing, acceleration data structures, global illumination, GPGPU, and machine learning for rendering.