
HIP Ray Tracing
HIP RT is a ray tracing library for HIP, making it easy to write ray tracing applications in HIP.
We are thrilled to announce the release of HIP RT v2.1. In this blog, we discuss new functionality and corresponding API changes.
In the previous versions, individual bottom-level geometries are constructed one by one, which might be inefficient for a large number of small geometries. We introduce batch construction for small geometries that allows us to build many small geometries efficiently in a single kernel launch. We added the following functions:
hiprtError hiprtCreateGeometries(...);hiprtError hiprtDestroyGeometries(...);hiprtError hiprtBuildGeometries(...);hiprtError hiprtGetGeometriesBuildTemporaryBufferSize(...);
These functions do the same operation as the corresponding single-geometry variants, processing multiple geometries at once. For example, hiprtCreateGeometries
takes multiple build inputs, creating multiple geometries efficiently using a single malloc call. Similarly, hiprtDestroyGeometries
destroys multiple geometries at once. For the construction itself, hiprtBuildGeometries
takes multiple build inputs and builds multiple geometries at once. Small geometries (up to 512 geometric primitives) are constructed in one kernel launch, while larger geometries are processed one by one using a specified quality build. Note that HIP RT internally separates small and large geometries, and thus a user does not need to do so explicitly. The maximum size of small geometries (i.e., geometries with primitives less or equal to this value are processed by the batch construction) can be specified in the build options:
struct hiprtBuildOptions{ hiprtBuildFlags buildFlags; u32 batchBuildMaxPrimCount;};
If batchBuildMaxPrimCount == 0
, the batch construction is disabled, and all geometries are processed sequentially. A caveat is that the batch construction internally uses a modified version of the fast build, which may have a slightly negative impact on the quality of the acceleration structure and ray tracing performance. Nonetheless, we believe that the negative impact is rather negligible as the geometries are very small.
The global stack efficiently combines shared memory and global memory. While the shared buffer allocation is relatively straightforward, determining the size of the global buffer is rather complicated. We decided to change the API to make the allocation more user-friendly. We introduce two new structures representing both buffer types:
struct hiprtGlobalStackBuffer{ u32 stackSize; u32 stackCount; void* stackData;};
struct hiprtSharedStackBuffer{ u32 stackSize; void* stackData;};
Both structures encapsulate the buffer address and stack size. The global buffer stack has additionally the stack count, defining how many stacks we need (typically one per scheduled thread). The global buffer can be created/destroyed via the following functions:
hiprtError hiprtCreateGlobalStackBuffer(hiprtContext context, const hiprtGlobalStackBufferInput& input, hiprtGlobalStackBuffer* stackBufferOut);
hiprtError hiprtDestroyGlobalStackBuffer(hiprtContext context, hiprtGlobalStackBuffer stackBuffer);
struct hiprtGlobalStackBufferInput{ hiprtStackType type = hiprtStackTypeGlobal; u32 stackSize; u32 threadCount;};
Besides the type (that we discuss below), we define just the stack size and the number of scheduled threads. With both allocated buffers, we can finally create a stack object:
hiprtGlobalStackBuffer globalStackBuffer = ...;hiprtSharedStackBuffer sharedStackBuffer = ...;hiprtGlobalStack stack(globalStackBuffer, sharedStackBuffer);
The global stack buffer contains stacks for all scheduled threads, which might be wasteful as only a fraction of the threads run is being executed concurrently. We introduce the dynamic stack that allocates stacks only for active threads and dynamically assigns the stacks to the threads on demand. HIP RT internally handles the whole process in the stack constructor. The dynamic stack is created in the same manner as the global stack; we need to change the type in hiprtGlobalStackBufferInput
to hiprtStackTypeDynamic
(we do not need to set threadCount
):
hiprtDynamicStack stack(globalStackBuffer, sharedStackBuffer);
Naturally, this brings some additional overhead, slightly increasing the register usage. We provide the dynamic stack as an option for systems with limited memory.
For some shading calculations, we need a transformation from/to object space. We could store these transformations explicitly in a separate buffer, but this is wasteful as hiprtScene
already contains the transformation data. We provide the functions that allow to query these transformations from the scene object:
hiprtFrameSRT hiprtGetObjectToWorldFrameSRT(hiprtScene scene, u32 instanceID, float time);hiprtFrameSRT hiprtGetWorldToObjectFrameSRT(hiprtScene scene, u32 instanceID, float time);hiprtFrameMatrix hiprtGetObjectToWorldFrameMatrix(hiprtScene scene, u32 instanceID, float time);hiprtFrameMatrix hiprtGetWorldToObjectFrameMatrix(hiprtScene scene, u32 instanceID, float time);
Notice that the functions take the time parameter. This is especially handy for motion blur as HIP RT internally correctly interpolates transformations.
hiprtSaveGeometry
and hiprtLoadGeometry
). Note that the scene IO functions are still not functional.hiprtBuildTraceKernels
and hiprtBuildTraceKernelsFromBitcode
.The download link for HIP RT v2.1 is available on the HIP RT page.
If you’re looking for some guidance on getting started with HIPRT, check out the HIP RT SDK tutorials repository and the HIP RT documentation page.