WMMA benefits for ML and general compute workloads

Originally posted: June 29, 2023

Last updated: February 6, 2024

The new Wave Matrix Multiply Accumulate (WMMA) instructions added in HLSL Shader Model 6.8 allow shader developers to accelerate Generalized Matrix Multiplication (GEMM) matrix operations of the form:

WMMA instructions task all threads in a wave to collaboratively perform a matrix-multiply operation with higher efficiency and throughput than previously achievable using SM 6.7 or earlier instructions.

GEMM operations have many uses in in signal-processing, physics simulations, machine learning, and computer vision.

Some examples include:

Performing Fast-Fourier Transforms for signal processing, such as audio, radio, or radar applications
Applying filters and post-processing to images
Calculating deformation of objects and fluids, such as fluid/physics sim or molecular simulation
Implementing deep learning operators – for example convolution, multi-layer-perceptron, neural radiance fields, and so on.

For more information on WMMA, see:

Release notes for Microsoft® Agility SDK Preview Release v1.711.3 including support for WaveMMA.
A preview AMD Software: Adrenalin Edition™ driver (includes the AMD implementation of the current WaveMMA specification on AMD Radeon™ RX 7000 Series graphics GPUs).
rocWMMA on GitHub.

Find out more about WMMA and matrix cores here on GPUOpen

Related news and technical articles

Meshlet compression

Meshlet compression

We show how to diminish the memory footprint of meshlet geometry, thus both the index buffer and the vertex attributes. Decompression then happens on the fly on every frame in the mesh shader.

GPU Work Graphs mesh nodes in Vulkan®

GPU Work Graphs mesh nodes in Vulkan®

We’ve added mesh nodes to our Vulkan® experimental extension, VK_AMDX_shader_enqueue.

Procedural grass rendering

Procedural grass rendering

The fourth post in our mesh shaders series takes a look at the specific example of rendering detailed vegetation.

GDC 2024 Work graphs and draw calls – a match made in heaven!

GDC 2024 Work graphs and draw calls – a match made in heaven!

Introducing "mesh nodes", which make draw calls an integral part of the work graph, providing a higher perf alternative to ExecuteIndirect dispatches.

Font- and vector-art rendering with mesh shaders

Font- and vector-art rendering with mesh shaders

The third post in our mesh shaders series covers how to use mesh shaders to simplify font rendering.

Optimization and best practices

Optimization and best practices

The second post in this series on mesh shaders covers best practices for writing mesh and amplification shaders, as well as how to use the AMD Radeon™ Developer Tool Suite to profile and optimize mesh shaders.

From vertex shader to mesh shader

From vertex shader to mesh shader

This post is the start of a new series which aims to demystify mesh shaders through examples and tutorials.

GDC 2024: Work graphs, mesh shaders, FidelityFX™, dev tools, CPU optimization, and more.

GDC 2024: Work graphs, mesh shaders, FidelityFX™, dev tools, CPU optimization, and more.

Our GDC 2024 presentations this year include work graphs, mesh shaders, AMD FSR 3, GI with AMD FidelityFX Brixelizer, AMD Ryzen optimization, RGD, RDTS, and GPU Reshape!

Related videos

Mesh Shaders – Learning Through Examples (Digital Dragons 2024) – YouTube link

Mesh Shaders – Learning Through Examples (Digital Dragons 2024) – YouTube link

Learn about the new Mesh Shader pipeline which can help to create even more better-looking games.

GDC 2024 - Mesh Shaders in AMD RDNA™ 3 Architecture - YouTube link

GDC 2024 - Mesh Shaders in AMD RDNA™ 3 Architecture - YouTube link

This talk describes the mesh shader pipeline and how it maps to the AMD RDNA™ 3 architecture.

DirectStorage: Optimizing Load-time and Streaming (GDC 2023 - YouTube link)

DirectStorage: Optimizing Load-time and Streaming (GDC 2023 - YouTube link)

Join us for a presentation about DirectStorage and how to integrate it to extract optimal load time and streaming performance.

Microsoft® Game Stack Live: AMD Ryzen Processor Software Optimization

Microsoft® Game Stack Live: AMD Ryzen Processor Software Optimization

Join AMD on an adventure thru Zen 2 and Zen 3 processors which power today’s game consoles and PCs. Dive into instruction sets, cache hierarchies, resource sharing, and simultaneous multi-threading. Journey across the sands of silicon to master microarchitecture and uncover best practices!

Microsoft® Game Stack Live: Denoising Raytraced Soft Shadows on Xbox Series X|S and Windows with FidelityFX

Microsoft® Game Stack Live: Denoising Raytraced Soft Shadows on Xbox Series X|S and Windows with FidelityFX

We explain how FidelityFX Denoiser allows for high-quality raytracing results without increasing rays per pixel, and deep dive into specific AMD RDNA™ 2-based optimizations that benefit both Xbox Series X|S and PC.

AMD RDNA™ 2 - DirectX® 12 Ultimate: Sampler Feedback and Mesh Shaders - YouTube link

AMD RDNA™ 2 - DirectX® 12 Ultimate: Sampler Feedback and Mesh Shaders - YouTube link

Engineer Colin Riley provides a short description of Sampler Feedback and Mesh Shaders, as well as important performance guidelines that graphics developers should consider when using AMD RDNA™ 2-based GPUs.