Batch construction
In the previous versions, individual bottom-level geometries are constructed one by one, which might be inefficient for a large number of small geometries. We introduce batch construction for small geometries that allows us to build many small geometries efficiently in a single kernel launch. We added the following functions:
These functions do the same operation as the corresponding single-geometry variants, processing multiple geometries at once. For example, hiprtCreateGeometries
takes multiple build inputs, creating multiple geometries efficiently using a single malloc call. Similarly, hiprtDestroyGeometries
destroys multiple geometries at once. For the construction itself, hiprtBuildGeometries
takes multiple build inputs and builds multiple geometries at once. Small geometries (up to 512 geometric primitives) are constructed in one kernel launch, while larger geometries are processed one by one using a specified quality build. Note that HIP RT internally separates small and large geometries, and thus a user does not need to do so explicitly. The maximum size of small geometries (i.e., geometries with primitives less or equal to this value are processed by the batch construction) can be specified in the build options:
If batchBuildMaxPrimCount == 0
, the batch construction is disabled, and all geometries are processed sequentially. A caveat is that the batch construction internally uses a modified version of the fast build, which may have a slightly negative impact on the quality of the acceleration structure and ray tracing performance. Nonetheless, we believe that the negative impact is rather negligible as the geometries are very small.
Global and dynamic stacks
The global stack efficiently combines shared memory and global memory. While the shared buffer allocation is relatively straightforward, determining the size of the global buffer is rather complicated. We decided to change the API to make the allocation more user-friendly. We introduce two new structures representing both buffer types:
Both structures encapsulate the buffer address and stack size. The global buffer stack has additionally the stack count, defining how many stacks we need (typically one per scheduled thread). The global buffer can be created/destroyed via the following functions:
hiprtError hiprtCreateGlobalStackBuffer(hiprtContext context, const hiprtGlobalStackBufferInput& input, hiprtGlobalStackBuffer* stackBufferOut);
hiprtError hiprtDestroyGlobalStackBuffer(hiprtContext context, hiprtGlobalStackBuffer stackBuffer);
struct hiprtGlobalStackBufferInput
{
hiprtStackType type = hiprtStackTypeGlobal;
u32 stackSize;
u32 threadCount;
};
Besides the type (that we discuss below), we define just the stack size and the number of scheduled threads. With both allocated buffers, we can finally create a stack object:
The global stack buffer contains stacks for all scheduled threads, which might be wasteful as only a fraction of the threads run is being executed concurrently. We introduce the dynamic stack that allocates stacks only for active threads and dynamically assigns the stacks to the threads on demand. HIP RT internally handles the whole process in the stack constructor. The dynamic stack is created in the same manner as the global stack; we need to change the type in hiprtGlobalStackBufferInput
to hiprtStackTypeDynamic
(we do not need to set threadCount
):
Naturally, this brings some additional overhead, slightly increasing the register usage. We provide the dynamic stack as an option for systems with limited memory.
Transformation query functions
For some shading calculations, we need a transformation from/to object space. We could store these transformations explicitly in a separate buffer, but this is wasteful as hiprtScene
already contains the transformation data. We provide the functions that allow to query these transformations from the scene object:
hiprtFrameSRT hiprtGetObjectToWorldFrameSRT(hiprtScene scene, u32 instanceID, float time);
hiprtFrameSRT hiprtGetWorldToObjectFrameSRT(hiprtScene scene, u32 instanceID, float time);
hiprtFrameMatrix hiprtGetObjectToWorldFrameMatrix(hiprtScene scene, u32 instanceID, float time);
hiprtFrameMatrix hiprtGetWorldToObjectFrameMatrix(hiprtScene scene, u32 instanceID, float time);
Notice that the functions take the time parameter. This is especially handy for motion blur as HIP RT internally correctly interpolates transformations.
Other features
- We optimized the radix sort in Orochi. This improves the construction speed of the fast and balanced builds (both builds rely on radix sort).
- We use tighter boxes for transformed instances in the top-level scene object, leading to higher ray tracing performance.
- We fixed the geometry IO functions (
hiprtSaveGeometry
andhiprtLoadGeometry
). Note that the scene IO functions are still not functional. - We added an option to enable/disable caching of the compiled trace kernels in
hiprtBuildTraceKernels
andhiprtBuildTraceKernelsFromBitcode
.
Download it today
The download link for HIP RT v2.1 is available on the HIP RT page.
If you’re looking for some guidance on getting started with HIPRT, check out the HIP RT SDK tutorials repository and the HIP RT documentation page.