AMD GCN Assembly: Cross-Lane Operations

Cross-lane operations are an efficient way to share data between wavefront lanes. This article covers in detail the cross-lane features that GCN3 offers.  I’d like …

The Art of AMDGCN Assembly: How to Bend the Machine to Your Will

The ability to write code in assembly is essential to achieving the best performance for a GPU program. In a previous blog we described how …

ROCm with Rapid Harmony : Optimizing HSA Dispatch

We previously looked at how to launch an OpenCL™ kernel using the HSA runtime. That example showed the basics of using the HSA Runtime. Here we’ll …

ROCm With Harmony: Combining OpenCL, HCC, and HSA in a Single Program

Introduction In a previous blog we discussed the different languages available on the ROCm platform.  Here we’ll show you how to combine several of these …

ROCm, Do You Speak My Language?

The open-source ROCm stack offers several programming-language choices. Overall, the goal is to give you a range of tools to help solve the problem at …

Getting Started with ROCm: Components, Platforms & Installation

Are You Ready to ROCK! The ROCm Platform delivers on the vision of the Boltzmann Initiative, bringing new opportunities in GPU Computing Research. On November …

Getting Up to Speed on the CodeXL GPU Profiler with Radeon Open Compute

With the announcement of the Boltzmann Initiative and the recent releases of ROCK and ROCR, AMD has ushered in a new era of Heterogeneous Computing. …

HIP release 0.82

It’s been just under two months since we publicly launched the HIP repository, and I wanted to share a quick update on the work we’ve …

GPUOpen, an Uninhibited Path to Science Discovery, Exploring the Limits of Engineering, or Just Creating Your Artistic World of Wonder

The Open Path to Bring Forward Your Ideas to High-Performance GPU Computing   Welcome to the new Portal I want to welcome you to the new …

A Brief Intro to the Heterogeneous Compute Compiler

In November, AMD launched the Boltzmann Initiative at Supercomputing 2015 with the goal of enabling developers to more easily employ the full compute potential of …