Home » Blogs » Announcing HIP RT v2.1

Introducing HIP RT v2.1 - batch construction for small geometries, transformation query functions, and more

Picture of Daniel Meister
Daniel Meister

Daniel Meister is a researcher and software engineer at AMD. His research interests include real-time ray tracing, acceleration data structures, global illumination, GPGPU, and machine learning for rendering.

We are thrilled to announce the release of HIP RT v2.1. In this blog, we discuss new functionality and corresponding API changes.

Batch construction

In the previous versions, individual bottom-level geometries are constructed one by one, which might be inefficient for a large number of small geometries. We introduce batch construction for small geometries that allows us to build many small geometries efficiently in a single kernel launch. We added the following functions:

Copied!

hiprtError hiprtCreateGeometries(...);
hiprtError hiprtDestroyGeometries(...);
hiprtError hiprtBuildGeometries(...);
hiprtError hiprtGetGeometriesBuildTemporaryBufferSize(...);

These functions do the same operation as the corresponding single-geometry variants, processing multiple geometries at once. For example, hiprtCreateGeometries takes multiple build inputs, creating multiple geometries efficiently using a single malloc call. Similarly, hiprtDestroyGeometries destroys multiple geometries at once. For the construction itself, hiprtBuildGeometries takes multiple build inputs and builds multiple geometries at once. Small geometries (up to 512 geometric primitives) are constructed in one kernel launch, while larger geometries are processed one by one using a specified quality build. Note that HIP RT internally separates small and large geometries, and thus a user does not need to do so explicitly. The maximum size of small geometries (i.e., geometries with primitives less or equal to this value are processed by the batch construction) can be specified in the build options:

Copied!

struct hiprtBuildOptions
{
    hiprtBuildFlags buildFlags;
    u32 batchBuildMaxPrimCount;
};

If batchBuildMaxPrimCount == 0, the batch construction is disabled, and all geometries are processed sequentially. A caveat is that the batch construction internally uses a modified version of the fast build, which may have a slightly negative impact on the quality of the acceleration structure and ray tracing performance. Nonetheless, we believe that the negative impact is rather negligible as the geometries are very small.

Global and dynamic stacks

The global stack efficiently combines shared memory and global memory. While the shared buffer allocation is relatively straightforward, determining the size of the global buffer is rather complicated. We decided to change the API to make the allocation more user-friendly. We introduce two new structures representing both buffer types:

Copied!

struct hiprtGlobalStackBuffer
{
    u32 stackSize;
    u32 stackCount;
    void* stackData;
};

struct hiprtSharedStackBuffer
{
    u32 stackSize;
    void* stackData;
};

Both structures encapsulate the buffer address and stack size. The global buffer stack has additionally the stack count, defining how many stacks we need (typically one per scheduled thread). The global buffer can be created/destroyed via the following functions:

Copied!

hiprtError hiprtCreateGlobalStackBuffer(hiprtContext context, const hiprtGlobalStackBufferInput& input, hiprtGlobalStackBuffer* stackBufferOut);

hiprtError hiprtDestroyGlobalStackBuffer(hiprtContext context, hiprtGlobalStackBuffer stackBuffer);

struct hiprtGlobalStackBufferInput
{
    hiprtStackType type = hiprtStackTypeGlobal;
    u32 stackSize;
    u32 threadCount;
};

Besides the type (that we discuss below), we define just the stack size and the number of scheduled threads. With both allocated buffers, we can finally create a stack object:

Copied!

hiprtGlobalStackBuffer globalStackBuffer = ...;
hiprtSharedStackBuffer sharedStackBuffer = ...;
hiprtGlobalStack stack(globalStackBuffer, sharedStackBuffer);

The global stack buffer contains stacks for all scheduled threads, which might be wasteful as only a fraction of the threads run is being executed concurrently. We introduce the dynamic stack that allocates stacks only for active threads and dynamically assigns the stacks to the threads on demand. HIP RT internally handles the whole process in the stack constructor. The dynamic stack is created in the same manner as the global stack; we need to change the type in hiprtGlobalStackBufferInput to hiprtStackTypeDynamic (we do not need to set threadCount):

Copied!

hiprtDynamicStack stack(globalStackBuffer, sharedStackBuffer);

Naturally, this brings some additional overhead, slightly increasing the register usage. We provide the dynamic stack as an option for systems with limited memory.

Transformation query functions

For some shading calculations, we need a transformation from/to object space. We could store these transformations explicitly in a separate buffer, but this is wasteful as hiprtScene already contains the transformation data. We provide the functions that allow to query these transformations from the scene object:

Copied!

hiprtFrameSRT hiprtGetObjectToWorldFrameSRT(hiprtScene scene, u32 instanceID, float time);
hiprtFrameSRT hiprtGetWorldToObjectFrameSRT(hiprtScene scene, u32 instanceID, float time);
hiprtFrameMatrix hiprtGetObjectToWorldFrameMatrix(hiprtScene scene, u32 instanceID, float time);
hiprtFrameMatrix hiprtGetWorldToObjectFrameMatrix(hiprtScene scene, u32 instanceID, float time);

Notice that the functions take the time parameter. This is especially handy for motion blur as HIP RT internally correctly interpolates transformations.

Other features

  • We optimized the radix sort in Orochi. This improves the construction speed of the fast and balanced builds (both builds rely on radix sort).
  • We use tighter boxes for transformed instances in the top-level scene object, leading to higher ray tracing performance.
  • We fixed the geometry IO functions (hiprtSaveGeometry and hiprtLoadGeometry). Note that the scene IO functions are still not functional.
  • We added an option to enable/disable caching of the compiled trace kernels in hiprtBuildTraceKernels and hiprtBuildTraceKernelsFromBitcode.

Download it today

The download link for HIP RT v2.1 is available on the HIP RT page.

If you’re looking for some guidance on getting started with HIPRT, check out the HIP RT SDK tutorials repository and the HIP RT documentation page.

Picture of Daniel Meister
Daniel Meister

Daniel Meister is a researcher and software engineer at AMD. His research interests include real-time ray tracing, acceleration data structures, global illumination, GPGPU, and machine learning for rendering.

Enjoy this blog post? If you found it useful, why not share it with other game developers?

You may also like...

Getting started: AMD GPUOpen software

New or fairly new to AMD’s tools, libraries, and effects? This is the best place to get started on GPUOpen!

AMD GPUOpen Getting Started Development and Performance

Looking for tips on getting started with developing and/or optimizing your game, whether on AMD hardware or generally? We’ve got you covered!

GPUOpen Manuals

Don’t miss our manual documentation! And if slide decks are what you’re after, you’ll find 100+ of our finest presentations here.

AMD GPUOpen Technical blogs

Browse our technical blogs, and find valuable advice on developing with AMD hardware, ray tracing, Vulkan®, DirectX®, Unreal Engine, and lots more.

AMD GPUOpen videos

Words not enough? How about pictures? How about moving pictures? We have some amazing videos to share with you!

AMD GPUOpen Performance Guides

The home of great performance and optimization advice for AMD RDNA™ 2 GPUs, AMD Ryzen™ CPUs, and so much more.

AMD GPUOpen software blogs

Our handy software release blogs will help you make good use of our tools, SDKs, and effects, as well as sharing the latest features with new releases.

AMD GPUOpen publications

Discover our published publications.