AMD Vega Instruction Set Architecture documentation

Understanding the instruction-level capabilities of any processor is a worthwhile endeavour for any developer writing code for it, even if the instructions that get executed …

Developer Quickstart: OpenCL on ROCm 1.6

Overview ROCm 1.6 introduces big updates to our OpenCL compiler and runtime implementation — built on top of the ROCm software stack! This developer release includes …

Optimizing GPU occupancy and resource usage with large thread groups

When using a compute shader, it is important to consider the impact of thread group size on performance. Limited register space, memory latency and SIMD occupancy each affect shader performance in different ways. This article discusses potential performance issues, and techniques and optimizations that can dramatically increase performance if correctly applied.

TrueAudio Next Demo and Paper at GameSoundCon

In 2016, AMD brought TrueAudio Next to GameSoundCon. GameSoundCon was held Sept 27-28 at the Millennium Biltmore Hotel in Los Angeles. GameSoundCon caters to game …

Leveraging asynchronous queues for concurrent execution

Understanding concurrency (and what breaks it) is extremely important when optimizing for modern GPUs. Modern APIs like DirectX® 12 or Vulkan™ provide the ability to …

Anatomy Of The Total War Engine: Part IV

Happy Warhammer Wednesday! This week Creative Assembly’s Lead Graphics Programmer Tamas Rabel talks about how Total War: Warhammer utilized asynchronous compute to extract some extra …

ROCm, Do You Speak My Language?

The open-source ROCm stack offers several programming-language choices. Overall, the goal is to give you a range of tools to help solve the problem at …

Maxing out GPU usage in nBodyGravity

Today we’re going to take a look at how asynchronous compute can help you to get the maximum out of a GPU. I’ll be explaining …