WMMA benefits for ML and general compute workloads

June 29, 2023

The new Wave Matrix Multiply Accumulate (WMMA) instructions added in HLSL Shader Model 6.8 allow shader developers to accelerate Generalized Matrix Multiplication (GEMM) matrix operations of the form:

WMMA instructions task all threads in a wave to collaboratively perform a matrix-multiply operation with higher efficiency and throughput than previously achievable using SM 6.7 or earlier instructions.

GEMM operations have many uses in in signal-processing, physics simulations, machine learning, and computer vision.

Some examples include:

Performing Fast-Fourier Transforms for signal processing, such as audio, radio, or radar applications
Applying filters and post-processing to images
Calculating deformation of objects and fluids, such as fluid/physics sim or molecular simulation
Implementing deep learning operators – for example convolution, multi-layer-perceptron, neural radiance fields, and so on.

For more information on WMMA, see:

Release notes for Microsoft® Agility SDK Preview Release v1.711.3 including support for WaveMMA.
A preview AMD Software: Adrenalin Edition™ driver (includes the AMD implementation of the current WaveMMA specification on AMD Radeon™ RX 7000 Series graphics GPUs).
rocWMMA on GitHub.

Find out more about WMMA and matrix cores here on GPUOpen

How to accelerate AI applications on RDNA 3 using WMMA

This blog is a quick how-to guide for using the WMMA feature with our RDNA 3 GPU architecture using a Hello World example.

AMD matrix cores (amd-lab-notes)

This first post in the ‘AMD lab notes’ series takes a look at AMD’s Matrix Core technology and how best to use it to speed up your matrix operations.

Using Neural Networks for Geometric Representation

Explore how Neural Intersection Functions (NIF) and the enhanced LSNIF are poised to reshape ray tracing by replacing traditional BVH traversal with efficient, GPU-friendly neural networks for accelerated performance and high-fidelity imagery.

AMD Radeon™ GPU Profiler 2.4 adds support for AMD Radeon™ RX 9000 Series, pure-compute applications, DirectML applications (and more!)

Discover the latest Radeon GPU Profiler v2.4, now supporting Radeon RX 9000 Series GPUs and profiling for pure compute and DirectML applications. Enhance your optimization with improved ISA views and Work Graphs support.

Accelerating Generative AI on AMD Radeon™ GPUs

Discover AMD-optimized ONNX models on Hugging Face for AMD Ryzen™ AI APUs and Radeon™ GPUs and incredible performance with the AMD Radeon RX 9000 Series’ advanced AI accelerators.

New DirectX® + video encoding features with AgilitySDK Preview Release 1.716.0

We now support new Microsoft® DirectX® and video encoding features, with the latest release of the AgilitySDK Preview Release 1.716.0.

Enjoy this news post? If you found it useful, why not share it with other game developers?

Latest news

Now available: AMD FidelityFX SDK 1.1.4 patch release (includes FSR 3.1.4)

This FidelityFX SDK update has FSR 3.1.4 fixes – including reduced upscaler ghosting in newly disoccluded pixels, plus updates to Brix GI and Breadcrumbs.

Boosting GPU Radix Sort performance: A memory-efficient extension to Onesweep with circular buffers

Discover a high-performance, memory-efficient extension to Onesweep radix sort on GPUs, featuring circular buffers and advanced optimization techniques that reduce global memory access and improve sorting throughput.

Bilinear interpolation on a quadrilateral using Barycentric coordinates

A new algebraic method for hardware-accelerated bilinear interpolation on convex quadrilaterals is presented, using the Barycentric coordinate feature of modern GPUs.