

Support both HIP and CUDA® with ease
The Orochi library loads HIP and CUDA® APIs dynamically, allowing you to switch between them at runtime. Orochi is named after a legendary Japanese dragon with eight heads and eight tails on a single body. In keeping with its namesake, Orochi enables a single library to use multiple backends at runtime.
Download the latest version - v2.0
This release adds the following features:
- Support many more CUDA/HIP functions compared to Orochi 1. Should be almost exhaustive.
- We will keep one branch per version of CUDA/HIP, (example of branch name:
release/hip5.7_cuda12.2
),
so developers can switch on branches depending on their environment.
If you need a combination that doesn’t exist, open an ‘Issue’ on the GitHub of the project. - Change compared to Orochi 1: you need to install the CUDA SDK corresponding to the branch you are using.
for example, if you use branchrelease/hip5.7_cuda12.2
, install CUDA SDK 12.2.
However CUDA will still be dynamically loaded at runtime, only includes of the SDK are used at compile time. - New demo for textures.
- New demo for Direct3D® 12 interop.
- Some refactoring/improvement of
OrochiUtils
. Orochi.h
can be included in the kernel files to have theoro*
names.- The binding and naming between HIP/CUDA have been improved and developed in a way it should be easier to maintain for future versions.
- Most of the Orochi/OrochiUtils API has not been changed so updating the project from Orochi 1.0 to 2.0 should be straightforward.
- We included an experimental high performance radix sort which we are going to publish the detail in the future.
Features
- No need to compile two separate implementations for HIP and CUDA.
- Compile and maintain a single binary that can run on both AMD and NVIDIA® GPUs.
- Dynamically load the corresponding HIP/CUDA shared libraries depending on your platform.
- Combines the functionality offered by both HIPEW and CUEW into a single library.
- No need to link to CUDA (for the driver APIs) nor HIP (for both driver and runtime APIs) at build-time.

Requirements
To run an application compiled with Orochi, you need to install a driver of your choice with the corresponding .dll/.so files based on the GPU(s) available. Orochi will automatically link with the corresponding shared library at runtime.
Version history
- Support many more CUDA/HIP functions compared to Orochi 1. Should be almost exhaustive.
- We will keep one branch per version of CUDA/HIP, (example of branch name:
release/hip5.7_cuda12.2
),
so developers can switch on branches depending on their environment.
If you need a combination that doesn’t exist, open an ‘Issue’ on the GitHub of the project. - Change compared to Orochi 1: you need to install the CUDA SDK corresponding to the branch you are using.
for example, if you use branchrelease/hip5.7_cuda12.2
, install CUDA SDK 12.2.
However CUDA will still be dynamically loaded at runtime, only includes of the SDK are used at compile time. - New demo for textures.
- New demo for Direct3D® 12 interop.
- Some refactoring/improvement of
OrochiUtils
. Orochi.h
can be included in the kernel files to have theoro*
names.- The binding and naming between HIP/CUDA have been improved and developed in a way it should be easier to maintain for future versions.
- Most of the Orochi/OrochiUtils API has not been changed so updating the project from Orochi 1.0 to 2.0 should be straightforward.
- We included an experimental high performance radix sort which we are going to publish the detail in the future.
- Bitcode linking support
- Added OrochiUtils. A wrapper for convenience
- A workaround for 22.7.1 AMD driver regression (missing RTC)
- Support more HIP and CUDA APIs
- Use only from CUDA driver apis (except for RTC)
- Proper error handling
- Unit test
- Bug fixes
- Initial release
Our other SDKs

Dense Geometry Compression Format (DGF) is our block-based geometry compression technology. It is a hardware-friendly format, supported by future GPU architectures.

AMD Schola is a library for developing reinforcement learning (RL) agents in Unreal Engine and training with your favorite python-based RL Frameworks.

AMD Radeon™ Anti-Lag 2 reduces the system latency by applying frame alignment between the CPU and GPU jobs.

Capsaicin is a Direct3D12 framework for real-time graphics research which implements the GI-1.0 technique and a reference path-tracer.

The Render Pipeline Shaders (RPS) SDK provides a framework for graphics engines to use Render Graphs with explicit APIs.

ADLX is a modern library designed to access features and functionality of AMD systems such as Display, 3D graphics, Performance Monitoring, GPU Tuning, and more.

Brotli-G is an open-source compression/decompression standard for digital assets (based on Brotli) that is compatible with GPU hardware.

HIP RT is a ray tracing library for HIP, making it easy to write ray tracing applications in HIP.

AMD Radeon™ ProRender is our fast, easy, and incredible physically-based rendering engine built on industry standards that enables accelerated rendering on virtually any GPU, any CPU, and any OS in over a dozen leading digital content creation and CAD applications.

Radeon™ Machine Learning (Radeon™ ML or RML) is an AMD SDK for high-performance deep learning inference on GPUs.

Harness the power of machine learning to enhance images with denoising, enabling your application to produce high quality images in a fraction of the time traditional denoising filters take.

The Advanced Media Framework SDK provides developers with optimal access to AMD GPUs for multimedia processing.
NVIDIA and CUDA are registered trademarks of NVIDIA Corporation.