Optimization Archives

Using Ryzen™ Threadripper for Game Development – optimising UE4 build times

Guest post by Sebastian Aaltonen, co-founder of Second Order. It covers optimising building the engine and asset production when using AMD Ryzen Threadripper processors.

Understanding GPU context rolls

Learn what a context roll on our GPUs is, how they apply to the pipeline and how they’re managed, and what you can do to analyse them and find out if they’re a limiting factor in the performance of your game or application.

Reducing Vulkan® API call overhead

This guest post, by Arseny Kapoulkine from Roblox, looks at the costs associated with calling various Vulkan functions tens or hundreds of thousands of times per frame, and ways to bring them down.

First Steps When Implementing FP16

Half-precision (FP16) computation is a performance-enhancing GPU technology long exploited in console and mobile devices not previously used or widely available in mainstream PC development.

CPU Core Count Detection on Windows®

Due to architectural differences between Zen and our previous processor architecture, Bulldozer, developers need to take care when using the Windows® APIs for processor and core enumeration.

Stable Barycentric Coordinates

The AMD GCN Vulkan extensions allow developers to get access to the barycentric coordinates at the fragment-shader level.

Optimizing GPU occupancy and resource usage with large thread groups

Sebastian Aaltonen, co-founder of Second Order Ltd, talks about how to optimize GPU occupancy and resource usage of compute shaders that use large thread groups.

Live VGPR Analysis with Radeon™ GPU Analyzer

This tutorial explains how to use Radeon GPU Analyzer (RGA) to produce a live VGPR analysis report for your shaders and kernels. Basic RGA usage knowledge is assumed.

Using Sub DWord Addressing on AMD GPUs

Sub DWord Addressing is a feature of the AMD GCN architecture which allows the efficient extraction of 8-bit and 16-bit values from a 32-bit register.

Profiling video memory with Windows® Performance Analyzer

A guide to using the Windows Performance Analyzer tool, with a focus on video resources.

Optimizing Terrain Shadows

One thing which is often forgotten is shadow map rendering. As the tessellation level of the terrain is not optimized for the shadow camera, but for the primary camera, this often results in a very strong mismatch and shadow maps end up getting extremely over-tessellated.

Leveraging Asynchronous Queues for Concurrent Execution

Understanding concurrency (and what breaks it) is extremely important when optimizing for modern GPUs.

Engines and APIs

Engines and APIs

Hybrid RT and samples

Hybrid RT and samples

Our SDKs and libraries

Our SDKs and libraries

Content creation

Content creation

Blogs and videos