Skip to content
Games & CGI
  • GPUOpen
  • Games & CGI
  • Professional Compute
  • Archive
  • Like
    Optimizing GPU occupancy and resource usage with large thread groups

    When using a compute shader, it is important to consider the impact of thread group size on performance. Limited register space, memory latency and SIMD occupancy each affect shader performance in different ways. This article discusses potential performance issues, and techniques and optimizations that can dramatically increase performance if correctly applied.

    20 1 05/24/2017
    DirectX12 Hardware Counter Profiling with Microsoft PIX and the AMD Plugin

    The AMD Developer Tools team is thrilled to announce the availability of the AMD plugin for Microsoft’s PIX for Windows tool. PIX is a performance …

    6 0 05/17/2017
    CodeXL 2.3 is released!

    A new version of the CodeXL open-source developer tool is out! Here are the major new features in this release: CPU Profiling Support for AMD …

    4 0 05/10/2017
    Content Creation Tools and Multi-GPU

    When it comes to multi-GPU (mGPU), most developers immediately think of complicated Crossfire setups with two or more GPUs and how to make their game …

    3 0 05/05/2017
    Capsaicin and Cream developer talks at GDC 2017

    Introduction Shortly after our Capsaicin and Cream event at GDC this year where we unveiled Radeon RX Vega, we hosted a developer-focused event designed to …

    6 0 04/05/2017
    Compressonator V2.5 Release Adds Enhanced HDR Support

     BC6 HDR Compression The BC6H codec has been improved and now offers better quality then previous releases, along with support for both 16 bit Half …

    3 0 03/29/2017
    Live VGPR Analysis with Radeon GPU Analyzer

    This article explains how to use Radeon GPU Analyzer (RGA) to produce a live VGPR analysis report for your shaders and kernels. Basic RGA usage …

    5 0 03/21/2017
    The Radeon Loom Stitching Pipeline

    I’m Mike Schmit, Director of Software Engineering with the Radeon Technologies Group at AMD. I’m leading the development of a new open-source 360-degree video-stitching framework …

    5 2 03/15/2017
    AMD LiquidVR MultiView Rendering in Serious Sam VR

    AMD LiquidVR MultiView Rendering in Serious Sam VR with the GPU Services (AGS) Library AMD’s MultiView Rendering feature reduces the number of duplicated object draw …

    7 0 02/27/2017
    TrueAudio Next Demo and Paper at GameSoundCon

    In 2016, AMD brought TrueAudio Next to GameSoundCon. GameSoundCon was held Sept 27-28 at the Millennium Biltmore Hotel in Los Angeles. GameSoundCon caters to game …

    2 2 02/24/2017
    Profiling video memory with Windows Performance Analyzer

    Budgeting, measuring and debugging video memory usage is essential for the successful release of game titles on Windows. As a developer, this can be efficiently achieved with the …

    9 0 02/09/2017
    GDC 2017 Presentations

    Another year, another Game Developer Conference! GDC is held earlier this year (27 February – 3 March 2017) which is leaving even less time for …

    14 5 02/01/2017
    AGS 5.0 – Shader Compiler Controls

    With the launch of AGS 5.0 developers now have access to the shader compiler control API.  Here’s a quick summary of the how and why…. Background …

    16 0 01/12/2017
    Optimizing Terrain Shadows

    There are many games out there taking place in vast environments. The basic building block of every environment is height-field based terrain – there’s no …

    10 2 12/15/2016
    Leveraging asynchronous queues for concurrent execution

    Understanding concurrency (and what breaks it) is extremely important when optimizing for modern GPUs. Modern APIs like DirectX® 12 or Vulkan™ provide the ability to …

    25 12 12/01/2016
    Selecting the Best Graphics Device to Run a 3D Intensive Application

    Summary Many Gaming and workstation laptops are available with both (1) integrated power saving and (2) discrete high performance graphics devices. Unfortunately, 3D intensive application …

    10 1 11/16/2016
    Vulkan and DOOM

    This post is taking a look at some of the interesting bits of helping id Software with their DOOM® Vulkan™ effort, from the perspective of …

    17 1 11/10/2016
    Implementing LiquidVR™ Affinity Multi-GPU support in Serious Sam VR

    This blog is guest authored by Croteam developer Karlo Jez and he will be giving us a detailed look at how Affinity Multi-GPU support was …

    11 0 10/31/2016
    AMD Driver Symbol Server

    When opening a 64-bit crash dump you will find that you will not necessarily get a sensible call stack. This is because 64-bit crash dumps …

    26 4 10/27/2016
    Vulkan barriers explained

    Vulkan™’s barrier system is unique as it not only requires you to provide what resources are transitioning, but also specify a source and destination pipeline …

    20 8 10/18/2016
    VDR Follow Up – Tonemapping for HDR Signals

    This is the third post in the follow up series to my prior GDC talk on Variable Dynamic Range. Prior posts covered dithering, today’s topic …

    7 0 10/05/2016
    Using RapidFire for Virtual Desktop and Cloud Gaming

    Virtual desktop infrastructure systems and cloud gaming are increasingly gaining popularity thanks to an ever more improved internet infrastructure. This gives more flexibility to the …

    7 0 09/27/2016
    AMD TrueAudio Next and CU Reservation – What is the Context?

    As noted in my previous blog, new innovations in virtual reality have spearheaded a renewed interest in audio processing, and many new as well as …

    20 0 09/26/2016
    Anatomy Of The Total War Engine: Part V

    This week marks the last in the series of our regular Warhammer Wednesday blog posts. We’d like to extent our thanks to Creative Assembly’s Lead …

    13 0 08/22/2016
    The Importance of Audio in VR

    Audio Must be Consistent With What You See Virtual reality demands a new way of thinking about audio processing. In the many years of history …

    10 8 08/16/2016
    Anatomy Of The Total War Engine: Part IV

    Happy Warhammer Wednesday! This week Creative Assembly’s Lead Graphics Programmer Tamas Rabel talks about how Total War: Warhammer utilized asynchronous compute to extract some extra …

    7 0 08/16/2016
    Anatomy Of The Total War Engine: Part III

    It’s Wednesday, so we’re continuing with our series on Total War: Warhammer. Here’s Tamas Rabel again with some juicy details about how Creative Assembly brought …

    15 2 08/10/2016
    Blazing CodeXL 2.2 is here!

    A new release of the CodeXL open-source developer tool is out! Here’s the hot new stuff in this release: New platforms support Support Linux systems …

    20 5 08/08/2016
    Anatomy Of The Total War Engine: Part II

    We’re back again on this fine Warhammer Wednesday with more from Tamas Rabel, Lead Graphics Programmer on the Total War series. In last week’s post …

    5 1 08/03/2016
    Anatomy Of The Total War Engine: Part I

    For the next few weeks we’ll be having a regular feature on GPUOpen that we’ve affectionately dubbed “Warhammer Wednesdays”. We’re extremely lucky to have Tamas Rabel, …

    11 10 07/27/2016
    Texel Shading

    Game engines do most of their shading work per-pixel or per-fragment. But there is another alternative that has been popular in film for decades: object …

    22 2 07/21/2016
    Vulkan Device Memory

    EDIT: 2016/08/08 – Added section on Targeting Low-Memory GPUs This post serves as a guide on how to best use the various Memory Heaps and …

    25 0 07/21/2016
    Performance Tweets Series: Root signature & descriptor sets

    Before Direct3D® 12 and Vulkan™, resources were bound to shaders through a “slot” system. Some of you might remember when hardware did have only very …

    4 8 07/14/2016
    Performance Tweets Series: Multi-GPU

    Multi-GPU systems are much more common than you might think. Most of the time, when someone mentions mGPU, you think about high-end gaming machines with …

    14 2 07/05/2016
    Compressonator v2.3 Release Delivers ASTC, ETC2 Codec Support and GPU Rendered Image Views

    Compressonator is a set of tools to allow artists and developers to more easily create compressed texture image assets and easily visualize the quality impact …

    8 0 06/27/2016
    Performance Tweets Series: Debugging & Robustness

    Prior to explicit graphics APIs a lot of draw-time validation was performed to ensure that resources were synchronized and everything set up correctly. A side-effect of this robustness …

    6 0 06/22/2016
    Performance Tweets Series: Rendering and Optimizations

    Direct3D® 12 and Vulkan™ significantly reduce CPU overhead and provide new tools to better use the GPU. For instance, one common use case for the …

    3 0 06/14/2016
    Performance Tweets Series: Streaming & Memory Management

    As promised, we’re back and today I’m going to cover how to get resources to and from the GPU. In the last post, we learned …

    5 0 06/07/2016
    CodeXL 2.1 is out and Searing hot with Vulkan

    A new CodeXL release is out! For the first time the AMD Developer Tools group worked on this release on the CodeXL GitHub public repository, …

    20 0 05/31/2016
    ShadowFX Effect Library for DirectX 12

    Today, we are excited to announce that we are releasing an update for ShadowFX that adds support for DirectX® 12. Features Different shadowing modes Union of …

    12 1 05/26/2016
    Turbocharge your Graphics and GPU Compute Applications with GPUPerfAPI

    Achieving high performance from your Graphics or GPU Compute applications can sometimes be a difficult task. There are many things that a shader or kernel …

    10 0 05/25/2016
    GCN Shader Extensions for Direct3D and Vulkan

    The GCN architecture contains a lot of functionality in the shader cores which is not currently exposed in current APIs like Vulkan™ or Direct3D® 12. One …

    21 11 05/24/2016
    AMD DOPPEngine – Post Processing on Your Desktop in Practice

    A Complete Tool to Transform Your Desktop Appearance After introducing our Display Output Post Processing (DOPP) technology, we are introducing a new tool to change …

    31 0 05/23/2016
    Fast compaction with mbcnt

    Compaction is a basic building block of many algorithms – for instance, filtering out invisible triangles as seen in Optimizing the Graphics Pipeline with Compute. …

    12 8 05/20/2016
    TressFX 3.1

    We are releasing TressFX 3.1. Our biggest update in this release is a new order-independent transparency (OIT) option we call “ShortCut”. We’ve also addressed some of …

    9 4 05/19/2016
    GeometryFX 1.2 – Cluster Culling

    Today’s update for GeometryFX introduces cluster culling. Previously, GeometryFX worked on a per-triangle level only. With cluster culling, GeometryFX is able to reject large chunks …

    7 5 05/18/2016
    Unlock the Rasterizer with Out-of-Order Rasterization

    Full-speed, out-of-order rasterization If you’re familiar with graphics APIs, you’re certainly aware of the API ordering guarantees. At their core, these guarantees mean that if …

    13 6 05/17/2016
    AMD FireRays 2.0 – Open Sourcing and Customizing Ray Tracing for Efficient Hardware Platforms Support

    A New Milestone After the success of the first version, FireRays is moving to another major milestone. We are open sourcing the entire library which …

    28 2 05/16/2016
    Slides from our “The most common Vulkan mistakes” talk

    Last week, we organized a two hours-long talk at University of Lodz in Poland where we discussed the most common mistakes we come across in Vulkan applications. Dominik Witczak, …

    8 5 05/13/2016
    Compressonator (AMD Compress) is Going Open Source

    We are very pleased to be announcing that AMD is open-sourcing one of our most popular tools and SDKs.  Compressonator (previously released as AMD Compress …

    15 0 05/12/2016
    AMD Crossfire API

    Gaming at optimal performance and quality at high screen resolutions can sometimes be a demanding task for a single GPU. 4K monitors are becoming mainstream and gamers …

    18 5 05/05/2016
    AMD GPU Services, an introduction

    If you have supported Crossfire™ or Eyefinity™ in your previous titles, then you have probably already used our AMD GPU Services (AGS) library.  A lot of …

    17 1 04/28/2016
    Performance Tweets Series: Resource Creation

    Resource creation and management has changed dramatically in Direct3D® and Vulkan™ compared to previous APIs. In older APIs, memory is managed transparently by the driver. …

    8 0 04/20/2016
    CodeXL 2.0 is Here and Open Source

    CodeXL major release 2.0 is out! It is chock-full of new features and a drastic change in the CodeXL development model: CodeXL is now open …

    26 6 04/19/2016
    VDR Follow Up – Grain and Fine Details

    The prior post in this series established a base technique for adding grain, and now this post is going to look at very subtle changes to …

    3 0 04/13/2016
    Performance Tweets Series: Shaders, Threading, Compiling

    Welcome back to our performance & optimization series. Today, we’ll be looking more closely at shaders. On the surface, it may look as if they …

    12 1 04/06/2016
    VDR Follow Up – Fine Art of Film Grain

    This is the first of a series of posts expanding on the ideas presented at GDC in the Advanced Techniques and Optimization of VDR Color …

    8 1 04/04/2016
    GDC 2016 Presentations Available

    The Game Developer Conference 2016 was an event of epic proportions. Presentations, tutorials, round-tables, and the show floor are only one part of the story …

    8 2 03/30/2016
    GCN Memory Coalescing

    This post describes how GCN hardware coalesces memory operations to minimize traffic throughout the memory hierarchy. The post uses the term “invocation” to describe one …

    12 1 03/21/2016
    Delta Color Compression Overview

    Bandwidth is always a scarce resource on a GPU. On one hand, hardware has made dramatic improvements with the introduction of ever faster memory standards …

    22 10 03/14/2016
    Using the Vulkan™ Validation Layers

    Vulkan™ provides unprecedented control to developers over generating graphics and compute workloads for a wide range of hardware, from tiny embedded processors to high-end workstation GPUs with wildly different …

    35 6 03/09/2016
    GDC 2016 Presentations

    The Game Developer Conference 2016 (GDC16) is held March 14-18 in the Moscone Center in San Francisco. This is the most important event for game developers, …

    35 0 02/29/2016
    Performance Tweets series: Barriers, fences, synchronization

    Welcome back to our DX12 series! Let’s dive into one of the hottest topics right away: synchronization, that is, barriers and fences! Barriers A barrier is …

    13 1 02/22/2016
    Vulkan Renderpasses

    Vulkan™ is a high performance, low overhead graphics API designed to allow advanced applications to drive modern GPUs to their fullest capacity. Where traditional APIs …

    54 0 02/16/2016
    Say Hello to a New Rendering API in Town!

    Imagine that you were asked one day to design an API with bleeding-edge graphics hardware in mind. It would need to be as efficient as …

    45 3 02/16/2016
    Performance Tweets Series: Command lists

    Hello and welcome to our series of blog posts covering performance advice for Direct3D® 12 & Vulkan™. You may have seen the #DX12PerfTweets on Twitter, and …

    10 1 02/10/2016
    Fetching From Cubes and Octahedrons

    For GPU-side dynamically generated data structures which need 3D spherical mappings, two of the most useful mappings are cubemaps and octahedral maps. This post explores …

    10 0 02/04/2016
    It’s Time to Open up the GPU

    I have met enough game developers in my professional life to know that these guys are among the smartest people on the planet. Those particular individuals will go …

    411 33 01/26/2016
    Up and Running with CodeXL Analyzer CLI

    About CodeXL Analyzer CLI CodeXL Analyzer CLI is an offline compiler and performance analysis tool for OpenCL™ kernels, DirectX® shaders and OpenGL® shaders. Using CodeXL …

    24 0 01/26/2016
    Create Your own GPU PerfStudio DirectX® 12 Plugin

    GPU PerfStudio supports DirectX® 12 on Windows® 10 PCs. The current tool set for DirectX 12 comprises of an API Trace, a new GPU Trace …

    10 0 01/26/2016
    Maxing out GPU usage in nBodyGravity

    Today we’re going to take a look at how asynchronous compute can help you to get the maximum out of a GPU. I’ll be explaining …

    26 4 01/26/2016
    Have You Tootled Your 3D Models?

    What’s New With the recent adoption of new APIs such as DirectX® 12 and Vulkan™, we are seeing renewed interest in an older tool.  AMD …

    35 0 01/26/2016
    Optimized Reversible Tonemapper for Resolve

    A typical problem with MSAA Resolve mixed with HDR is that a single sample with a large HDR value can over-power all other samples, resulting …

    16 0 01/26/2016

    VDR Follow Up – Fine Art of Film Grain

    Posted on April 4, 2016October 14, 2016 by Timothy Lottes
    dithering, film grain, HDR, VDR

    This is the first of a series of posts expanding on the ideas presented at GDC in the Advanced Techniques and Optimization of VDR Color Pipelines talk. This post details generation of symmetric grain ideal for traditional transfer functions like sRGB.

    Below is the original photograph used in the dithering section of the talk. The photograph was chosen for a mix of smooth and detailed areas in combination with hard-to-quantize desaturated colors.

    LottesGrain1

    Now showing quantization to 8 steps in sRGB without addition of any grain. Quantization is nearest linear distance (which has a visual advantage over nearest non-linear difference for small numbers of steps).

    LottesGrain2

    Next the following repeating generic pseudo-random noise texture (a poor grain proxy) is added in linear prior to quantization and conversion to sRGB.

    LottesGrain5

    This noise is very similar to the results of the “fract, sin/cos, dot” in-shader texture-free method described on Gregory Igehy’s Notes of Pseudo-Random Generator for Shaders. The resulting image after conversion.

    LottesGrain3

    While it is an improvement over quantization without grain, application of this grain results in various artifacts as the eye is distracted by the structure of the grain. Looking again at the noise texture, this time only one channel, it is possible to see a lot of low frequency content mixed in the noise.

    LottesGrain7

    To improve upon this, the noise can be shaped into a visually pleasing grain, for example by application of a high-pass filter on both the x and y axis. In this case a different cutoff frequency is used in each axis to leave a feeling of paper texture.

    LottesGrain8

    An alternative could be to start with some other photographic source converted into a tiling texture. Both of these cases can have problems caused by a non-even distribution of values in the grain texture. It is possible to re-shape the grain texture into a perfectly balanced distribution of values using the following method.

    1. For all texels of a given channel build a 64-bit value: {32-bit channel intensity, 16-bit texel x coordinate, 16-bit texel y coordinate}.
    2. Randomly shuttle the ordering of the 64-bit values (to deal with duplicates).
    3. Use a radix sort to sort all the 64-bit values.
    4. Take the sorted position divided by number of texels as the new texel channel intensity.
    5. Use the packed {16-bit texel coordinates} to scatter the new texel intensity back to the original image.

    Applying the above process to the prior grain texture yields the following result for a single channel.

    LottesGrain9

    And for the full color grain texture.

    LottesGrain6

    Application of this new grain to the original image yields the following high quality result (this is only a 3-bit per channel sRGB image).

    LottesGrain4

    Details on Grain Application

    Starting with the technique used in the images in this post,

    
    
    // Quantization steps, for 8-bit for example this would be 256.
    float quantizationSteps;
    
    // Linear color input.
    float3 color = ...;
    
    // This is used to limit the addition of grain around black to avoid increasing the black level.
    // This should be a pre-computed constant.
    // At zero, grain amplitude is limited such that the largest negative grain value would still quantize to zero.
    // Showing the example for sRGB, the ConvertSrgbToLinear() does sRGB to linear conversion.
    float grainBlackLimit = 0.5 * ConvertSrgbToLinear(1.0 / (quantizationSteps - 1.0));
    
    // This should also be a pre-computed constant.
    // With the exception of around the blacks, a constant linear amount of grain is added to the image.
    // Technically with low amounts of quantization steps, it would also be good to limit around white as well.
    // Given the primary usage case is high number of quantization steps,
    // limiting around whites is not perceptually important.
    // The largest linear distance between steps is always the highest output value.
    // This sets the constant linear amount of grain to fully dither the highest output value.
    // This does result in a higher-than-required amount of grain in the darks.
    // Using 0.75 leaves overlap to ensure the grain does not disappear at the linear mid-point between steps.
    float grainAmount = 0.75 * (ConvertSrgbToLinear(1.0 / (quantizationSteps - 1.0)) - 1.0);
    
    // Point-sampled grain texture scaled to {-1.0 to 1.0}.
    // Note the grain is sampled without a sRGB-to-linear conversion.
    // Grain is a standard RGBA UNORM format (not sRGB labeled).
    float3 grain = ...;
    
    // Apply grain to linear color.
    color = grain * min(color + grainBlackLimit, grainAmount) + color;
    

    When grain is applied temporally, sending in a per-frame offset to SV_Position can be used to temporally offset the grain texture. A {2,3} Halton Sequence with a 1024 frame period works quite well. This method of adding grain is quite fast, only requiring {1 TEX, and 13 VALU instructions} extra to implement.

    
    
    // Example minimal shader (ideally grain would get folded into some other pass).
    // Showing with the associated VALU opcodes used interleaved in comments.
    cbuffer CB0 : register(b0) { int2 halton; float2 grainConst; };
    Texture2D texColor;
    Texture2D texGrain;
    float3 main(float4 vpos : SV_Position) : SV_Target {
      // 2x V_CVT_I32_F32
      int3 pos = int3(vpos.xy, 0);
      float3 color = texColor.Load(pos).rgb;
      // 2x V_ADD_I32, 2x V_BFE_U32
      pos.xy = (pos.xy + halton) & 255;
      float3 grain = texGrain.Load(pos).rgb;
      // 3x V_ADD_F32, 3x V_MIN_F32, 3x V_MAC_F32
      return grain * min(color + grainConst.x, grainConst.y) + color; }
    
    Timothy Lottes is a member of the Developer Technology Group at AMD. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

    1 Comment

    Comment
    Mathias Rauen says:
    October 14, 2016 at 2:49 pm

    Good to see dithering finally getting some attention. I’ve been using it in my video renderer (madVR) for more than 7 years now. FWIW, your random noise image looks much worse than it should/could. Here’s how proper TPDF dithering (using a white noise texture) looks like at 3bit:

    http://madshi.net/LottesGrainTPDF.png

    Of course noise level is still pretty high. If you prefer lower noise levels, try high-quality ordered dithering like this:

    http://madshi.net/LottesGrainOrdered.png

    Reply

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    • Terms and Conditions
    • /
    • Privacy
    • /
    • Cookie Policy
    • /
    • Trademarks
    ©2017 Advanced Micro Devices, Inc. OpenCL™ and the OpenCL™ logo are trademarks of Apple, Inc., used with permission by Khronos.