CPU core count detection on Windows

Due to architectural differences between Zen and our previous processor architecture, Bulldozer, developers need to take care when using the Windows® APIs for processor and core enumeration. …

Radeon GPU Profiler 1.0.2

Thanks (again!) Before we dive into a run over the release notes for the 1.0.2 release of Radeon GPU Profiler, we’d like to thank everyone …

Radeon GPU Profiler 1.0

Introduction and thanks Effective GPU performance analysis is a more complex proposition for developers today than it ever has been, especially given developments in how …

Optimizing GPU occupancy and resource usage with large thread groups

When using a compute shader, it is important to consider the impact of thread group size on performance. Limited register space, memory latency and SIMD occupancy each affect shader performance in different ways. This article discusses potential performance issues, and techniques and optimizations that can dramatically increase performance if correctly applied.

Profiling video memory with Windows Performance Analyzer

Budgeting, measuring and debugging video memory usage is essential for the successful release of game titles on Windows. As a developer, this can be efficiently achieved with the …

Selecting the Best Graphics Device to Run a 3D Intensive Application

Summary Many Gaming and workstation laptops are available with both (1) integrated power saving and (2) discrete high performance graphics devices. Unfortunately, 3D intensive application …

Blazing CodeXL 2.2 is here!

A new release of the CodeXL open-source developer tool is out! Here’s the hot new stuff in this release: New platforms support Support Linux systems …

The Art of AMDGCN Assembly: How to Bend the Machine to Your Will

The ability to write code in assembly is essential to achieving the best performance for a GPU program. In a previous blog we described how …

ROCm with Rapid Harmony : Optimizing HSA Dispatch

We previously looked at how to launch an OpenCL™ kernel using the HSA runtime. That example showed the basics of using the HSA Runtime. Here we’ll …

Performance Tweets Series: Rendering and Optimizations

Direct3D® 12 and Vulkan™ significantly reduce CPU overhead and provide new tools to better use the GPU. For instance, one common use case for the …