If you have supported Crossfire™ or Eyefinity™ in your previous titles, then you have probably already used our AMD GPU Services (AGS) library.  A lot of new features have been added to AGS recently, so if you have used the library before or not, it might be worth taking a look at the latest version.

GPU knowledge is power!

When initializing the library, you can optionally pass in a struct that returns some useful information about your GPU.  This includes information such as whether your AMD GPU is based on the GCN architecture or not.  This may help you choose which code path or specific features your game uses.  If you want to identify the GPU more specifically, then we also provide the adapter string and the device and revision ids.  Note that the revision id is now just as important as the device id in identifying your GPU! This is because AMD has graphics hardware that share device id but are differentiated by revision id.  For example, the Radeon™ Fury, Radeon Fury X and Radeon Nano all share the same device id of 0x7300 but have different revision ids of 0xCB, 0xC8 and 0xCA respectively.  The full list of device and revision ids can be found here: http://developer.amd.com/resources/hardware-drivers/ati-catalyst-pc-vendor-id-1002-li/ Another potentially useful field is the driver version.  If your game requires a minimum driver version, then AGS would be the way to query for that.  Note that this is not the same as the Catalyst Version or Radeon Software Version as those refer to the version of the entire software suite.  The driver version field refers to the DirectX™ parts of the driver and will be of the format Year.Major.Minor.Point, e.g. 16.15.2401.1002 for the 16.5.1 Crimson Hotfix.

AGSGPUInfo gpuInfo;
if ( agsInit( &agsContext, nullptr, &gpuInfo ) == AGS_SUCCESS )
{
    printf( "%s, device id: 0x%04X, revision id: 0x%02X\n",
            gpuInfo.adapterString ? gpuInfo.adapterString : "unknown GPU",
            gpuInfo.deviceId, gpuInfo.revisionId );
    printf( "Driver version: %s\n", gpuInfo.driverVersion );
}

Picking sensible graphics defaults made easy

A common problem in PC games development is picking sensible out-of-the-box settings based on the PC specifications.  In the past we have suggested using device ids to determine if you give your game the low, medium or high treatment.  The problem here being you need an up to date list of device ids (and revision ids!) and the knowledge of roughly what kind of performance to expect from each one.  Then you will have to patch your game with any device ids that come onto the market post game release – far from ideal. AGS now provides some useful information on your GCN GPUs, namely number of compute units, core clock speed and memory clock speed.  The teraflop power is then derived as

Tflops = core clock speed * num compute units * 64 pixels per clk * 2 instructions per MAD

So for Radeon Fury hardware you would get 8+ Tflops, for a Radeon R9 290X 5.6 Tflops.  For the Radeon R9 380, 3.5 Tflops.  For an AMD A10 APU, ~1 Tflop. This way you can future proof your title by some very simply logic such as:

AGSGPUInfo gpuInfo;
if ( agsInit( &agsContext, nullptr, &gpuInfo ) == AGS_SUCCESS )
{
    if ( gpuInfo.fTFlops < 1.0f ) // This also catches the case of pre-GCN hardware where gpuInfo.fTFlops = 0
        Settings = Low; 
    else if ( gpuInfo.fTFlops < 3.0f ) 
        Settings = Medium; 
    else if ( gpuInfo.fTFlops < 6.0f ) 
        Settings = High; 
    else
        Settings = Ultra;
}

You can never have too many monitors 🙂

As before, the Eyefinity information can be retrieved from AGS so be sure to use this if you plan on supporting multiple monitor configurations. This will help you determine the layout of the monitors so you can adjust the camera’s field of view accordingly and ensure all your User Interface (UI) stays on the center monitor.  The AGS package comes with its own Eyefinity sample showing you how to use the API and how to set your camera up.  There is also a cool option that renders a 3×1 Eyefinity setup as three separate viewports to avoid the stretching that a very wide field of view often causes.  With the advent of low overhead APIs such as DirectX® 12 and Vulkan™, this is now a much more practical solution than before.
Eyefinity 3×1 with a single wide field of view camera


Eyefinity 3×1 with three standard field of view cameras

API Usage for retrieving Eyefinity information:


void GetEyefinityInfo( AGSContext* context, int primaryDisplayIndex )
{
    int numDisplaysInfo = 0;

    // Query the number of displays first
    if ( agsGetEyefinityConfigInfo( context, primaryDisplayIndex, nullptr, &numDisplaysInfo, nullptr ) == AGS_SUCCESS && numDisplaysInfo > 0 )
    {
        AGSEyefinityInfo eyefinityInfo = {};
        AGSDisplayInfo* displaysInfo = new AGSDisplayInfo[ numDisplaysInfo ];
        ZeroMemory( displaysInfo, numDisplaysInfo * sizeof( *displaysInfo ) );

        // Find out if this display has an Eyefinity config enabled
        if ( agsGetEyefinityConfigInfo( context, primaryDisplayIndex, &eyefinityInfo, &numDisplaysInfo, displaysInfo ) == AGS_SUCCESS )
        {
            if ( eyefinityInfo.iSLSActive )
            {
                printf( "Eyefinity enabled for display index %d:\n", primaryDisplayIndex );
                printf( " SLS grid is %d displays wide by %d displays tall\n", eyefinityInfo.iSLSGridWidth, eyefinityInfo.iSLSGridHeight );
                printf( " SLS resolution is %d x %d pixels\n", eyefinityInfo.iSLSWidth, eyefinityInfo.iSLSHeight );

                if ( eyefinityInfo.iBezelCompensatedDisplay )
                {
                    printf( " SLS is bezel-compensated\n" );
                }

                for ( int i = 0; i < numDisplaysInfo; i++ )
                {
                    printf( "Display %d\n", i );

                    if ( displaysInfo[ i ].iPreferredDisplay )
                    {
                        printf( " Preferred/main monitor\n" );
                    }

                    printf( " SLS grid coord [%d,%d]\n", displaysInfo[i].iGridXCoord, displaysInfo[i].iGridYCoord );
                    printf( " Base coord [%d,%d]\n", displaysInfo[i].displayRect.iXOffset, displaysInfo[i].displayRect.iYOffset );
                    printf( " Dimensions [%d x %d]\n", displaysInfo[i].displayRect.iWidth, displaysInfo[i].displayRect.iHeight );
                    printf( " Visible base coord [%d,%d]\n", displaysInfo[i].displayRectVisible.iXOffset, displaysInfo[i].displayRectVisible.iYOffset );
                    printf( " Visible dimensions [%d x %d]\n", displaysInfo[i].displayRectVisible.iWidth, displaysInfo[i].displayRectVisible.iHeight );
                }
            }
            delete[] displaysInfo;
        }
    }
}

You can never have too many GPUs

Well, two GPUs in Crossfire is a good start, so let’s make sure you get the most out of them!  For DirectX 11, AGS gives you three choices: use the driver’s built in peer-to-peer resource synchronization when running in AFR mode (this is the default), completely disable Crossfire altogether, or explicitly tag your resources for syncing using the new Crossfire API. Query the Crossfire GPU count using the API below. Be sure not to get this confused with the total number of GPUs in the system:

int numCrossfireGPUs = 0;
if ( agsGetCrossfireGPUCount( agsContext, &numCrossfireGPUs ) == AGS_SUCCESS )
{
    printf( "Crossfire GPU count = %d\n", numCrossfireGPUs );
}

Squeeze extra performance out of DirectX 11 using GCN driver extensions

DirectX 11 was released before the launch of AMD’s GCN architecture and therefore misses out some important features from its API.  AGS allows developers to gain access to some of these features via driver extensions. Some of these are explained below:

UAV Overlap

With developers using more and more compute shaders, Write-After-Write (WAW) events are becoming more prevalent.  That is to say, if you write to an unordered access view (UAV) in one call, then write to the same UAV in the next call, the DirectX 11 driver will insert a barrier between the calls to avoid a WAW hazard.  If you can guarantee you are not writing to the same part of the UAV in the second call then the two calls could overlap on the GPU since there is no dependency between them. The Begin/EndUAVOverlap API in AGS does exactly this: it tells the driver not to insert any WAW barriers within the scope defined by these calls. The code would look something like this:

// Disable automatic WAW syncs
agsDriverExtensions_BeginUAVOverlap( m_agsContext );
// Submit back-to-back dispatches that write to the same UAV
m_device->Dispatch( ... );  // First half of UAV
m_device->Dispatch( ... );  // Second half of UAV
// Reenable automatic WAW syncs
agsDriverExtensions_EndUAVOverlap( m_agsContext );

MultiDrawIndirect

One of the main issues with the DirectX 11 API is its relatively high driver overhead on the CPU.   DrawInstancedIndirect is a great way to reduce the number of draw calls by batching together similar objects and generating the instance buffer on the GPU.  However, you can take this to the next level by batching multiple instance buffers together into one Multi Draw Indirect call (MDI). The extension allows the following code:

// Submit n batches of DrawIndirect calls
for ( int i = 0; i < n; i++ )
    DrawIndexedInstancedIndirect( buffer, i * sizeof( cmd ) );
To be replaced by the following call:

// Submit all n batches in one call
agsDriverExtensions_MultiDrawIndexedInstancedIndirect( m_agsContext, n, buffer, 0, sizeof( cmd ) );

Depth Bounds Testing

Depth bounds testing has been available in OpenGL for quite some time now and is an easy way to cull pixel shader work outside a specific depth range.  This is exposed in the agsDriverExtensions_SetDepthBounds call in AGS and is demonstrated in the SDK sample DepthBoundsTest11, linked below, along with our page for AGS. If there is anything you would really like to see added to AGS, please let us know! Happy coding!

Resources

AGS

AMD GPU Services (AGS) Library

The AMD GPU Services (AGS) library provides software developers with the ability to query AMD GPU software and hardware state information that is not normally available through standard operating systems or graphics APIs.

You can download the library and samples from GitHub: