HIP Ray Tracing 2.0 introduces improved custom function handling, bitcode linking, and more

We are happy to announce the release of HIPRT-v2, which includes several new features and optimizations. In order to overcome some inefficiencies, we have made some changes to the API. In this blog post, we will provide an overview of these changes and showcase the new functionality with a few simple examples.

Custom function table

The biggest change that we made is in handling custom functions and how they are arranged in the custom function table. In previous versions, we used a function pointer to pass an intersection function for custom primitives. It turned out that the function pointers are very inefficient in HIP, causing a major bottleneck. To bypass this limitation, we pass functions names (instead of the function pointers) to HIPRT prior the compilation. The custom function table is a bounded structure that is internally constructed, allowing compiler to see the whole picture, fully optimizing the code. Thus, this solution causes almost no overhead and make the setup even simpler.

The custom function table is a two-dimensional structure where each entry is identified by a ray type index (rows) and geometry type index (columns). The table is flattened to a one-dimensional array such that the first entry correspond to the first ray type and the first geometry type, the second entry to the first ray type and the second geometry type, and so on.

Let us show how this works on a simple example. First of all, we have to set the geomType in the hiprtGeometryBuildInput prior to the hiprtGeometry construction to refer to it later in the custom function table:

Copied!

hiprtGeometryBuildInput geomInput;
geomInput.geomType = 0;

We want to use the following intersection function for our custom primitives. Notice that the function signature is different than in the previous versions, encapsulating the hit attributes in the hiprtHit struct and thus making it more compact:

Copied!

__device__ bool intersectFunc(const hiprtRay& ray, const void* data, void* payload, hiprtHit& hit) {...}

We set the function name in the hiprtFuncNameSet struct, which represents one entry in the custom function table:

Copied!

hiprtFuncNameSet funcNameSet;
funcNameSet.intersectFuncName = "intersectFunc";

We assign the entry to the desired position in the table. Note how the two-dimensional table is flattened into the one-dimensional array.:

Copied!

constexpr int RayTypes = 2;
constexpr int GeomTypes = 3;
hiprtFuncNameSet funcNameSets[RayTypes * GeomTypes];

int geomTypeIndex = 1;
int rayTypeIndex = 2;
funcNameSets[rayTypeIndex * GeomTypes + geomTypeIndex] = funcNameSet;

We pass the table with function names and the dimension to the trace kernel compilation function. Alternatively, we can use hiprtBuildTraceKernels instead of hiprtBuildTraceKernelsFromBitcode:

Copied!

hiprtApiFunction function;
hiprtBuildTraceKernelsFromBitcode(..., GeomTypes, RayTypes, funcNameSets, &function);

Now, we have to create the custom function table itself:

Copied!

hiprtFuncTable funcTable;
hiprtCreateFuncTable(hiprtContext, GeomTypes, RayTypes, &funcTable);

Typically, we want to pass data corresponding to the custom primitives to the intersection function:

Copied!

hiprtFuncDataSet funcDataSet;
functDataSet.intersectFuncData = ...;
hiprtSetFuncTable(hiprtContext, funcTable, geomTypeIndex, rayTypeIndex, funcDataSet);

Finally, we pass the table to the traversal object. Note that in HIPRT-v2, the order of arguments of the traversal object has changed:

Copied!

hiprtSceneTraversalClosest tr(scene, ray, hiprtFullRayMask, hiprtTraversalHintDefault, nullptr, table);

In HIPRT-v2, besides the custom intersection functions, we can setup filter functions that can be used to filter some undesired intersections. For example, self-intersections or alpha masking. The signature of the function is almost the same as for intersection functions expect that the hit is a constant reference as the intersection is already found. We want to decide whether the found intersection should be filtered out (returning true), or reported as a proper hit (returning false):

Copied!

__device__ bool filterFunc(const hiprtRay& ray, const void* data, void* payload, const hiprtHit& hit) {...}

The filter function name is passed in the same manner as the intersection function name:

Copied!

hiprtFuncNameSet funcNameSet;
funcNameSet.filterFuncName = "filterFunc";

Compilation and Bitcode linking

The second big change that we made is related to the trace kernel compilation process. In earlier versions, the trace kernel code was assembled on-the-fly and compiled by the HIP runtime compilation API. However, the introduction of Bitcode linking in HIP opened up new possibilities for device code linking that were previously unavailable. We have leveraged this technology to link the precompiled HIPRT traversal code to user-provided application code, resulting in a much cleaner and faster compilation process. The API now provides the hiprtBuildTraceKernelsFromBitcode function for this new approach that internally does all the work online.

It is also possible to perform bitcode linking manually offline, which might be useful for some cases, for example, when the online linking is not possible (due to various constraints), or when we want to save time needed for the linking itself.

We need the following three files:

User kernel code: user_code.cpp
hiprt compiled bitcode: hiprt02000_amd_lib_win.bc (a part of the HIPRT SDK)
the custom function table: custom_function_table.cpp (see below)

For the following commands, we assume that the HIP SDK is installed on the system, including hipcc and clang from the HIP SDK in the PATH. Use Windows Shell or PowerShell on Windows (WSL, MinGW, cygwin, etc. are not compatible with hipcc and clang).

We use the following command to compile user_code.cpp to bitcode, assuming we are building for Navi21 (gfx1030):

Copied!

hipcc -O3 -std=c++17 --offload-arch=gfx1030 -fgpu-rdc -c --gpu-bundle-output -c -emit-llvm -I../../ -ffast-math -D BLOCK_SIZE=64 -D SHARED_STACK_SIZE=16 user_code.cpp -parallel-jobs=15 -o user_code.bc

Note that BLOCK_SIZE and SHARED_STACK_SIZE are only needed if we use hiprtGlobalStack.

We use the following command to link the user bitcode with the HIPRT library bitcode:

Copied!

clang -fgpu-rdc --hip-link --cuda-device-only --offload-arch=gfx1030 user_code.bc  hiprt02000_amd_lib_win.bc -o offline_linked_user_code.hipfb

If we have filter functions or intersection functions in our code, we have to write the custom function table manually (custom_function_table.cpp). Assuming we have a filter function called myFilter, the custom function looks as follows:

Copied!

#if defined(__CUDACC__)
#include <cuda_runtime.h>
#include <cmath>
#else
#include <hip/hip_runtime.h>
#endif
#include <hiprt/hiprt_device.h>

__device__ bool myFilter(const hiprtRay& ray, const void* data, void* payload, const hiprtHit& hit);

__device__ bool intersectFunc(
	unsigned int geomType, unsigned int rayType, const hiprtFuncTableHeader& tableHeader, const hiprtRay& ray, void* payload, hiprtHit& hit)
{
	const unsigned int	index = tableHeader.numGeomTypes * rayType + geomType;
	const void* data  = tableHeader.funcDataSets[index].intersectFuncData;
	switch (index)
	{
	default: {
		return false;
	}
	}
}

__device__ bool filterFunc(
    unsigned int geomType, unsigned int	rayType, const hiprtFuncTableHeader& tableHeader, const hiprtRay& ray, void* payload, const hiprtHit& hit)
{
	const unsigned int	index = tableHeader.numGeomTypes * rayType + geomType;
	const void* data  = tableHeader.funcDataSets[index].filterFuncData;
	switch (index)
	{
	case 0: {
		return myfilter( ray, data, payload, hit );
	}
	default: {
		return false;
	}
	}
}

Notice how the ray type and geometry type are mapped to the index (see the previous section). We can add more custom functions in the same manner.

We use the following command to compile custom_function_table.cpp:

Copied!

hipcc -O3 -std=c++17 --offload-arch=gfx1030 -fgpu-rdc -c --gpu-bundle-output -c -emit-llvm -I../../ -ffast-math custom_function_table.cpp -parallel-jobs=15 -o custom_function_table.bc

We use the following command to link everything together:

Copied!

clang -fgpu-rdc --hip-link --cuda-device-only --offload-arch=gfx1030 user_code.bc  custom_function_table.bc hiprt02000_amd_lib_win.bc -o offline_linked_user_code.hipfb

Once we have the linked hip fat binary file, you can load it as a HIP module, and query necessary function pointers from it.

Other changes

We introduced ray traversal hints for Navi3x, which provide a new level of control over performance optimization by allowing users to add additional knowledge about input ray distribution, such as shadow or reflection rays. These traversal hints can be specified via the traversal object constructor:

Copied!

hiprtSceneTraversalClosest tr(scene, ray, hiprtTraversalHintShadowRays);

For user convenience, we added support for transformation matrices. In the previous versions, we supported only the SRT transformations defined by the components (SRT). Now, the user can specify the transformation type in the hiprtSceneBuildInput struct:

Copied!

hiprtSceneBuildInput sceneInput;
sceneInput.frameType  = hiprtFrameTypeMatrix;

To distinguish between the transformation structures, we have two structures: hiprtFrameSRT for the component-wise representation and hiprtFrameMatrix for the matrix representation. Note that regardless of the type, HIPRT internally correctly handles interpolation of frames in cases of motion blur.

We can specify minimum t value in the hiprtRay struct that might be useful, for example, for subsurface scattering. Note that the time parameter was moved to the traversal object constructor as the last argument:

Copied!

hiprtRay ray;
ray.minT = 0.1f;
hiprtSceneTraversalClosest tr(scene, ray, hiprtFullRayMask, hiprtTraversalHintDefault, nullptr, nullptr, 0, time);

We made the construction API thread-safe, allowing multiple hiprtGeometry structures to be constructed concurrently using multiple threads with HIP streams.
We extended the API to allow user to compile multiple templated trace kernels by passing the number of trace functions, an array of names of the trace functions, and array of the output functions:

Copied!

int numFunctions = 2;
const char* funcNames = {"Trace<true>", "Trace<false>"};
hiprtApiFunction functionsOut[2];
hiprtBuildTraceKernelsFromBitcode(..., numFunctions, funcNames, ..., functionsOut);