Up and Running with CodeXL Analyzer CLI

Picture of Amit Ben-Moshe

Amit Ben-Moshe

Amit Ben-Moshe is a Technical Lead and a Principal Member of Technical Staff at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

About CodeXL Analyzer CLI

CodeXL Analyzer CLI is an offline compiler and performance analysis tool for OpenCL kernels, DirectX® shaders and OpenGL® shaders. Using CodeXL Analyzer CLI, you can compile kernels and shaders for a variety of AMD GPUs and APUs, independent of your system hardware, and generate AMD ISA, intermediate language and performance statistics for each target platform.

CodeXL Analyzer CLI is being used by graphics engineers and by developers of parallel-computing applications to identify performance bottlenecks and optimize their code. It is also being used as a backend for shader compilation and performance statistics generation by AMD tool products: CodeXL’s Analyze mode and GPU PerfStudio’s Shader Analyzer.

Key Features

Compile OpenCL kernels and DirectX or OpenGL shaders, to generate AMD ISA code, intermediate language code, performance statistics and program binaries.
Generate , binaries and performance statistics for a variety of AMD GPUs and APUs, independent from the device that is physically installed on your system.
Observe how different optimizations and compilation chains affect the performance of your kernels and shaders: 32-bit vs 64-bit, , various compiler optimizations, kernel and shader code changes, and more.

CodeXL Analyzer CLI supports both Microsoft Windows® and Linux®.

Launching CodeXL Analyzer CLI

CodeXL Analyzer CLI’s commands are comprised of multiple command line switches, some of which are relevant to all platforms (OpenCL, DirectX, OpenGL), and others are platform-specific. Below is a list of key command options that are applicable to all platforms:

Key basic options

Command line switch	Description	Comments
-s	Specifies the platform: “cl” for OpenCL, “hlsl” for DirectX and “glsl” for OpenGL	Each invocation handles a single platform
-s –h	Display the help menu for the selected platform
-c	Target device for which output would be generated	Can appear multiple times; If not present, all devices are targeted by default
-l	List the names of the supported devices
–isa	Generate textual ISA code and save the result to the given output full path	The Analyzer concatenates the device name to the file name to differentiate between the output of different devices
-a	Generate performance statistics and save the result to the given output full path	The Analyzer concatenates the device name to the file name to differentiate between the output of different devices

In the following sections, we will go through key command options for specific platforms. We will focus on the most commonly used commands, and not cover all available options. For the list of all available options, you can always use the –h command line switch.

Key OpenCL-specific options

For OpenCL kernels, CodeXL Analyzer CLI can compile high-level source code and extract AMD IL code and compiled binaries, in addition to textual ISA and performance statistics. Here are the options that are specific for OpenCL kernels:

Command line switch	Description	Comments
–il	Generate textual AMD IL code and save the result to the given output file (full file path)	Output file name is changed to differentiate between the output of different devices
-b	Save the compiled binaries to the given output file (full file path)
–kernel	Generate output for the given kernel	Use –kernel all to target all kernels

Key DirectX-specific options

For DirectX shaders, CodeXL Analyzer CLI can extract DX ASM code, in addition to textual ISA and performance statistics. Here are the options that are specific for DirectX shaders:

Command line switch	Description	Comments
-f	The name of the target entry point
-p	The shader profile (e.g. vs_5_0, ps_5_0, etc.)
–DumpMSIntermediate	Save the DX ASM code to the given output full path

Key OpenGL-specific options

For OpenGL, only single shader source files.

Command line switch	Description	Comments
-p	Specifies the shader type: Vertex, TessEval, Geometry, Fragment and compute	Tessellation control shaders are not supported by the Anlayzer’s “glsl” mode

Note: CodeXL Analyzer CLI’s “glsl” mode, which accepts only a single shader, is deprecated and will be replaced in future versions with a new OpenGL mode, which will allow compiling and linking of whole OpenGL programs, and generation of more accurate ISA, performance statistics and per-stage binaries.

Usage Examples

Let’s have a look at the following .cl file (BinarySearch_Kernels.cl, taken from the AMD APP SDK):

__kernel void
binarySearch( __global uint4 * outputArray,
__const __global uint2 * sortedArray,
const unsigned int findMe)
{
    unsigned int tid = get_global_id(0);
    uint2 element = sortedArray[tid];
    if((element.x > findMe) || (element.y < findMe))
    {
        return;
    }
    else
    {
        outputArray[0].x = tid;
        outputArray[0].w = 1;
    }
}

__kernel void
binarySearch_mulkeys(__global int *keys,
__global uint *_input,
const unsigned int numKeys,
__global int *_output)
{
    int gid = get_global_id(0);
    int lBound = gid * 256;
    int uBound = lBound + 255;

    for(int i = 0; i < numKeys; i++)
    {
        if(keys[i] >= _input[lBound] && keys[i] <=_input[uBound])
            _output[i]=lBound;
    }
}

__kernel void
binarySearch_mulkeysConcurrent(__global uint *keys,
__global uint *_input,
const unsigned int inputSize,
const unsigned int numSubdivisions,
__global int *_output)
{
    int lBound = (get_global_id(0) % numSubdivisions) * (inputSize / numSubdivisions);
    int uBound = lBound + inputSize / numSubdivisions;
    int myKey = keys[get_global_id(0) / numSubdivisions];
    int mid;

    while(uBound >= lBound)
    {
        mid = (lBound + uBound) / 2;
        if(_input[mid] == myKey)
        {
            _output[get_global_id(0) / numSubdivisions] = mid;
            return;
        }
        else if(_input[mid] > myKey)
            uBound = mid - 1;
        else
            lBound = mid + 1;
    }
}