![](https://gpuopen.com/wp-content/uploads/2020/01/blog_img_green_code_1920x700-300x109.jpg)
Reducing Vulkan® API call overhead
This guest post, by Arseny Kapoulkine from Roblox, looks at the costs associated with calling various Vulkan functions tens or hundreds of thousands of times per frame, and ways to bring them down.
This guest post, by Arseny Kapoulkine from Roblox, looks at the costs associated with calling various Vulkan functions tens or hundreds of thousands of times per frame, and ways to bring them down.
Half-precision (FP16) computation is a performance-enhancing GPU technology long exploited in console and mobile devices not previously used or widely available in mainstream PC development.
Due to architectural differences between Zen and our previous processor architecture, Bulldozer, developers need to take care when using the Windows® APIs for processor and core enumeration.
The AMD GCN Vulkan extensions allow developers to get access to the barycentric coordinates at the fragment-shader level.
Sebastian Aaltonen, co-founder of Second Order Ltd, talks about how to optimize GPU occupancy and resource usage of compute shaders that use large thread groups.
This tutorial explains how to use Radeon GPU Analyzer (RGA) to produce a live VGPR analysis report for your shaders and kernels. Basic RGA usage knowledge is assumed.
Sub DWord Addressing is a feature of the AMD GCN architecture which allows the efficient extraction of 8-bit and 16-bit values from a 32-bit register.
A guide to using the Windows Performance Analyzer tool, with a focus on video resources.
One thing which is often forgotten is shadow map rendering. As the tessellation level of the terrain is not optimized for the shadow camera, but for the primary camera, this often results in a very strong mismatch and shadow maps end up getting extremely over-tessellated.
Understanding concurrency (and what breaks it) is extremely important when optimizing for modern GPUs.
3D intensive application performance may suffer greatly if the best graphics device is not selected. As a developer you can easily fix this problem by adding only one line to your executable’s source code.
Cross-lane operations are an efficient way to share data between wavefront lanes. This article covers in detail the cross-lane features that GCN3 offers.