Looking for knowledge beyond our software?
Explore our continually-growing library of technical blogs, written by AMD engineers and guest game developers. Benefit from their valuable experience covering general development techniques, developing with AMD hardware, ray tracing, HPC, ML, Vulkan®, DirectX®, Unreal Engine®, and lots more.
Some of our recent popular blogs
Find out what you need to get started with Work Graphs for DirectX 12, including the software required, configuration, compiling, and more.
The GPUOpen Matrix Compendium covers how matrices are used in 3D graphics and implementations in host code and shading languages. It’s a growing guide, so keep checking back!
In this blog series, we share the lessons learned from tuning a wide range of scientific applications, libraries, and frameworks for AMD GPUs.
The “why” of multi-resolution geometric representation using Bounding Volume Hierarchy for ray tracing
The benefits of the level of details technique for ray tracing are not trivial. This blog explores the issues, giving the rationale for our new technique.
Make sure you don't miss out on some of our perennially popular blogs too!
Guest post by Sebastian Aaltonen, co-founder of Second Order. It covers optimising building the engine and asset production when using AMD Ryzen Threadripper processors.
Sebastian Aaltonen, co-founder of Second Order Ltd, talks about how to optimize GPU occupancy and resource usage of compute shaders that use large thread groups.
Barriers control resource and command synchronisation in Vulkan applications and are critical to performance and correctness. Learn more here.
Cross-lane operations are an efficient way to share data between wavefront lanes. This article covers in detail the cross-lane features that GCN3 offers.
Browse through all our technical blogs in one place
|the_page_ID||Blog title||Description||Originally posted||Author||page_taxonomy_category|
|~ID-000140||Optimized Reversible Tonemapper for Resolve||Optimized tonemapper form of the technique Brian Karis talks about on Graphics Rants: Tone mapping. Replace the luma computation with max3(red,green,blue).||26th January 2016||Timothy Lottes||Developer guides|
|~ID-000237||Getting the Most Out of Delta Color Compression||DCC is a domain-specific compression that tries to take advantage of data coherence. It’s lossless, and adapted for 3D rendering. The key idea is to process whole blocks instead of individual pixels.||14th March 2016||Chris Brennan||Developer guides|
|~ID-001014||Fetching From Cubes and Octahedrons||For GPU-side dynamically generated data structures which need 3D spherical mappings, two of the most useful mappings are cubemaps and octahedral maps. This post explores the overhead of both mappings.||4th February 2016||Timothy Lottes||Developer guides|
|~ID-001211||Maxing Out GPU usage in nBodyGravity||Asynchronous compute can help you to get the maximum GPU usage. I’ll be explaining the details based on the nBodyGravity sample from Microsoft.||26th January 2016||Matthäus Chajdas||Developer guides|
|~ID-001861||Understanding Memory Coalescing on GCN||An explanation of how GCN hardware coalesces memory operations to minimize traffic throughout the memory hierarchy.||21st March 2016||Timothy Lottes||Developer guides|
|~ID-002113||Vulkan® Renderpasses||Renderpasses are objects designed to allow an application to communicate the high-level structure of a frame to the driver.||16th February 2016||Graham Sellers||Developer guides|
|~ID-002362||Using the Vulkan® Validation Layers||Vulkan validation layers make it easier to catch any mistakes, provide useful information beyond basic errors and minimize portability issues.||9th March 2016||Daniel Rakos||Developer guides|
|~ID-002779||Unlock the Rasterizer with Out-of-Order Rasterization||GCN hardware supports a special out-of-order rasterization mode which relaxes the ordering guarantee, and allows fragments to be produced out-of-order.||17th May 2016||Matthäus Chajdas||Developer guides|
|~ID-002814||Using Vulkan® Device Memory||This post serves as a guide on how to best use the various Memory Heaps & Memory Types exposed in Vulkan on AMD drivers, starting with some high-level tips.||21st July 2016||Timothy Lottes||Developer guides|
|~ID-002901||GCN Shader Extensions for Direct3D® and Vulkan®||One of the mandates of GPUOpen is to give developers better access to the hardware, and this post details extensions for Vulkan and Direct3D12 that expose additional GCN features to developers.||24th May 2016||Matthäus Chajdas||Developer guides|
|~ID-002904||Fast Compaction with mbcnt||With shader extensions, we provide access to a much better tool to get compaction done: GCN provides a special op-code for compaction within a wavefront.||20th May 2016||Matthäus Chajdas||Developer guides|
|~ID-003532||The Art of AMDGCN Assembly: How to Bend the Machine to Your Will||This article explains how to produce Hsaco from assembly code and also takes a closer look at some new features of the GCN architecture.||29th June 2016||Ben Sander||Developer guides|
|~ID-003720||Texel Shading||Game engines do most of their shading work per-pixel or per-fragment. But there is another alternative that has been popular in film for decades…||21st July 2016||Karl Hillesland||Developer guides|
|~ID-003855||Anatomy Of The Total War Engine: Part I||Tamas Rabel, Lead Graphics Programmer on the Total War series provides a detailed look at the Total War renderer as well as digging deep into some of the optimizations that the team at Creative Assembly did for the brilliant, Total War: Warhammer.||27th July 2016||Tamas Rabel||Developer guides|
|~ID-003859||Vulkan® and DOOM||This post takes a look at the interesting bits of helping id Software with their DOOM Vulkan effort, from the perspective of AMD’s Game Engineering Team.||10th November 2016||Timothy Lottes||Developer guides|
|~ID-003919||Anatomy Of The Total War Engine: Part II||Tamas Rabel from Creative Assembly discusses how performance was measured with the Total War Engine.||3rd August 2016||Tamas Rabel||Developer guides|
|~ID-003953||AMD GCN Assembly: Cross-Lane Operations||Cross-lane operations are an efficient way to share data between wavefront lanes. This article covers in detail the cross-lane features that GCN3 offers.||10th August 2016||Ben Sander||Developer guides|
|~ID-004082||Anatomy Of The Total War Engine: Part III||Here’s Tamas Rabel again with some juicy details about how Creative Assembly brought Total War to DirectX® 12.||10th August 2016||Tamas Rabel||Developer guides|
|~ID-004145||Anatomy Of The Total War Engine: Part IV||Tamas Rabel talks about how Total War: Warhammer utilized asynchronous compute to extract some extra GPU performance in DirectX® 12 and delves into the process of moving some of the passes in the engine to asynchronous compute pipelines.||16th August 2016||Tamas Rabel||Developer guides|
|~ID-004230||Anatomy Of The Total War Engine: Part V||The final instalment in Tamas Rabel’s insight into developing the Total War engine looks at Multi-GPU.||22nd August 2016||Tamas Rabel||Developer guides|
|~ID-004290||Using RapidFire for Virtual Desktop and Cloud Gaming||RapidFire SDK captures and encodes the input images entirely on the GPU and then copies the encoded result into the system memory for processing on the CPU.||27th September 2016||Bruno Stefanizzi||Developer guides|
|~ID-004423||Vulkan® Barriers Explained||Barriers control resource and command synchronisation in Vulkan applications and are critical to performance and correctness. Learn more here.||18th October 2016||Matthäus Chajdas||Developer guides|
|~ID-004487||AMD Driver Symbol Server||How to set up the AMD Driver Symbol Server in Visual Studio.||27th October 2016||Gareth Thomas||Developer guides|
|~ID-004567||Selecting the Best Graphics Device to Run a 3D Intensive Application||3D intensive application performance may suffer greatly if the best graphics device is not selected. As a developer you can easily fix this problem by adding only one line to your executable’s source code.||16th November 2016||Ken Mitchell||Developer guides|
|~ID-004755||Leveraging Asynchronous Queues for Concurrent Execution||Understanding concurrency (and what breaks it) is extremely important when optimizing for modern GPUs.||1st December 2016||Stephan Hodes||Developer guides|
|~ID-004824||Optimizing Terrain Shadows||One thing which is often forgotten is shadow map rendering. As the tessellation level of the terrain is not optimized for the shadow camera, but for the primary camera, this often results in a very strong mismatch and shadow maps end up getting extremely over-tessellated.||15th December 2016||Matthäus Chajdas||Developer guides|
|~ID-004861||Profiling video memory with Windows® Performance Analyzer||A guide to using the Windows Performance Analyzer tool, with a focus on video resources.||9th February 2017||Cristian Cutocheras||Developer guides|
|~ID-005536||Using Sub DWord Addressing on AMD GPUs||Sub DWord Addressing is a feature of the AMD GCN architecture which allows the efficient extraction of 8-bit and 16-bit values from a 32-bit register.||24th February 2017||Aditya Atluri||Developer guides|
|~ID-005567||Live VGPR Analysis with Radeon™ GPU Analyzer||This tutorial explains how to use Radeon GPU Analyzer (RGA) to produce a live VGPR analysis report for your shaders and kernels. Basic RGA usage knowledge is assumed.||21st March 2017||Amit Ben-Moshe||Developer guides|
|~ID-005928||CPU Core Count Detection on Windows®||Due to architectural differences between Zen and our previous processor architecture, Bulldozer, developers need to take care when using the Windows® APIs for processor and core enumeration.||14th September 2017||Ken Mitchell||Developer guides|
|~ID-005948||Content Creation Tools and Multi-GPU||mGPU isn’t just for gamers – if you’re a developer working on a game, you should think of using mGPU to make your life easier.||5th May 2017||Matthäus Chajdas||Developer guides|
|~ID-006013||Optimizing GPU occupancy and resource usage with large thread groups||Sebastian Aaltonen, co-founder of Second Order Ltd, talks about how to optimize GPU occupancy and resource usage of compute shaders that use large thread groups.||24th May 2017||Sebastian Aaltonen||Developer guides|
|~ID-006359||Understanding Vulkan® Objects||An important part of learning the Vulkan® API is to understand what types of objects are defined by it, what they represent and how they relate to each other.||7th August 2017||Adam Sawicki||Developer guides|
|~ID-006483||Stable Barycentric Coordinates||The AMD GCN Vulkan extensions allow developers to get access to the barycentric coordinates at the fragment-shader level.||30th August 2017||Rys Sommefeldt||Developer guides|
|~ID-006564||First Steps When Implementing FP16||Half-precision (FP16) computation is a performance-enhancing GPU technology long exploited in console and mobile devices not previously used or widely available in mainstream PC development.||20th April 2018||Tom Hammersley||Developer guides|
|~ID-006609||Deferred Path Tracing By Enscape||Insights from Enscape as to how they designed a renderer that produces path traced real time global illumination and can also converge to offline rendered image quality.||6th December 2017||Thomas Schander||Developer guides|
|~ID-006686||Understanding GPU context rolls||Learn what a context roll on our GPUs is, how they apply to the pipeline and how they’re managed, and what you can do to analyse them and find out if they’re a limiting factor in the performance of your game or application.||29th June 2018||Rys Sommefeldt||Developer guides|
|~ID-006834||Reducing Vulkan® API call overhead||This guest post, by Arseny Kapoulkine from Roblox, looks at the costs associated with calling various Vulkan functions tens or hundreds of thousands of times per frame, and ways to bring them down.||26th April 2018||Arseny Kapoulkine||Developer guides|
|~ID-007019||Decoding Radeon™ Vulkan® versions||A guide to using our machine-readable mapping that you can integrate into your software for decoding Radeon™ Vulkan® versions.||2nd August 2018||Rys Sommefeldt||Developer guides|
|~ID-007193||Using Ryzen™ Threadripper for Game Development – optimising UE4 build times||Guest post by Sebastian Aaltonen, co-founder of Second Order. It covers optimising building the engine and asset production when using AMD Ryzen Threadripper processors.||17th December 2018||Sebastian Aaltonen||Developer guides|
|~ID-008277||Integrating AMD FidelityFX into the Ego Engine||Tom Hammersley from Codemasters talks about integrating FidelityFX into the Ego Engine and implementing Contrast Adaptive Sharpening (CAS).||18th December 2019||Tom Hammersley||Developer guides|
|~ID-011842||Unreal Engine Performance Guide||Our one-stop guide to performance with Unreal Engine.||30th April 2020||GPUOpen||Developer guides|
|~ID-013579||Integrating RenderDoc for Unconventional Apps||One of our engineers explains a few small code changes that can help you integrate RenderDoc for more unconventional applications.||20th July 2020||Matthäus Chajdas||Developer guides|
|~ID-013823||Porting Detroit: Become Human from PlayStation® 4 to PC – Part 3||The final part of this joint series with Quantic Dream discusses shader scalarization, async compute, multithreaded render lists, memory management using our Vulkan Memory Allocator (VMA), and much more.||25th September 2020||Lou Kramer||Developer guides|
|~ID-013997||Porting Detroit: Become Human from PlayStation® 4 to PC – Part 1||Porting the PS4® game Detroit: Become Human to PC presented some interesting challenges. This first part of a joint collaboration from engineers at Quantic Dream and AMD discusses the decision to use Vulkan® and talks shader pipelines and descriptors.||21st September 2020||Lou Kramer||Developer guides|
|~ID-013998||Porting Detroit: Become Human from PlayStation® 4 to PC – Part 2||Part 2 of this joint post between Quantic Dream and AMD looks at non-uniform resource indexing on PC and for AMD cards specifically.||23rd September 2020||Lou Kramer||Developer guides|
|~ID-016632||AMD Ryzen CPU Performance Guide||Design faster. Render faster. Iterate faster. Our one-stop resource for getting great AMD Ryzen performance.||20th April 2021||GPUOpen||Developer guides|
|~ID-018196||How to get the most out of Smart Access Memory (SAM)||Smart Access Memory (SAM) provides the CPU with direct access to all video memory. These guidelines help you to improve CPU and GPU performance using SAM.||15th June 2021||Oskar Homburg||Developer guides|
|~ID-020439||Vulkan’s Best Practice layer now has AMD-specific checks||Introducing AMD checks for the Vulkan® Best Practice validation layer! Find out more about how it now incorporates many of our performance suggestions.||2nd September 2021||Nadav Geva||Developer guides|
|~ID-021100||Understanding Graphs in Radeon GPU Profiler and GPUView||Find out how to read and understand graphs in Radeon GPU Profiler and GPUView in order to optimize your game more effectively.||3rd December 2021||Adam Sawicki||Developer guides|
|~ID-024955||Integrating VRS in The Riftbreaker||EXOR Studios and AMD have collaborated to add Variable Rate Shading in The Riftbreaker. Read this guest blog to find out more!||13th May 2022||GPUOpen||Developer guides|
|~ID-025067||The “why” of multi-resolution geometric representation using Bounding Volume Hierarchy for ray tracing||The benefits of the level of details technique for ray tracing are not trivial. This blog explores the issues, giving the rationale for our new technique.||9th May 2022||Takahiro Harada||Developer guides|
|~ID-030236||AMD matrix cores (amd-lab-notes)||This first post in the ‘AMD lab notes’ series takes a look at AMD’s Matrix Core technology and how best to use it to speed up your matrix operations.||14th November 2022||amd-lab-notes||Developer guides|
|~ID-030237||Finite difference method – Laplacian part 1 (amd-lab-notes)||The finite difference method is a powerful tool for computational physics. This post covers how to implement a GPU-accelerated finite difference code using AMD’s HIP API.||14th November 2022||amd-lab-notes||Developer guides|
|~ID-036749||Finite difference method – Laplacian part 2 (amd-lab-notes)||In this post we introduce two common optimizations that can be applied to the kernel to reduce data movement and bring us closer to the new peak: loop tiling to explicitly reduce memory loads and re-order the memory access pattern to improve caching.||4th January 2023||amd-lab-notes||Developer guides|
|~ID-038825||AMD Instinct™ MI200 GPU memory space overview – amd-lab-notes||This post introduces commonly-used memory spaces, identifies what makes each memory space unique, and discusses some common use-cases for each space.||9th March 2023||amd-lab-notes||Developer guides|
|~ID-038937||Introduction to profiling tools for AMD hardware (amd-lab-notes)||This post gives an overview of AMD’s open source profiling tools, helping you diagnose bottlenecks and understand how your application is using the hardware.||12th April 2023||amd-lab-notes||Developer guides|
|~ID-043432||Pre-multiplication, left-handed coordinate system as in DirectX® 9 – Matrix Compendium||GPUOpen Matrix Compendium: This page shows a selection of matrices in the coordinate system expected by DirectX® 9.||5th April 2023||Matrix Compendium||Developer guides|
|~ID-043433||Introduction – Matrix Compendium||The GPUOpen Matrix Compendium covers how matrices are used in 3D graphics and implementations in host code and shading languages. It’s a growing guide, so keep checking back!||5th April 2023||Matrix Compendium||Developer guides|
|~ID-043434||Pre-multiplication, right-handed coordinate system – Matrix Compendium||GPUOpen Matrix Compendium: This page shows a selection of matrices in a pre-multiplication, right-handed coordinate system.||5th April 2023||Matrix Compendium||Developer guides|
|~ID-043435||Post-multiplication, right-handed coordinate system as in OpenGL® – Matrix Compendium||GPUOpen Matrix Compendium: This page shows a selection of matrices in the coordinate system expected by OpenGL®.||5th April 2023||Matrix Compendium||Developer guides|
|~ID-043436||Post-multiplication, left-handed coordinate system – Matrix Compendium||GPUOpen Matrix Compendium: This page shows a selection of matrices in a post-multiplication, left-handed coordinate system.||5th April 2023||Matrix Compendium||Developer guides|
|~ID-044925||Finite difference method – Laplacian part 4 – AMD lab notes||In the fourth and final part of Finite Difference Laplacian blog series we cover scaling studies and cache size limitations||18th July 2023||amd-lab-notes||Developer guides|
We have lots more documentation for you to discover!
Don’t miss our manual documentation! And if slide decks are what you’re after, you’ll find 100+ of our finest presentations here.
Browse all our useful samples. Perfect for when you’re needing to get started, want to integrate one of our libraries, and much more.
Words not enough? How about pictures? How about moving pictures? We have some amazing videos to share with you!
The home of great performance and optimization advice for AMD RDNA™ 2 GPUs, AMD Ryzen™ CPUs, and so much more.
Our handy software release blogs will help you make good use of our tools, SDKs, and effects, as well as sharing the latest features with new releases.
Discover our published publications.