From version 4.0, the Windows® version of Compressonator supports GPU-based encoding with DirectX® Compute (DXC) or OpenCL™ (OCL) shaders. This tutorial explains how to use this feature.
Encoding with GPU Options
The new encoding options can be selected from the command line tool using “
-EncodeWith < DXC | OCL>
” or the GUI application’s dialog using “Encode with” drop-down selections “GPU_DirectX” or “GPU_OpenCL” as shown in this example:
As an alternative to the GUI or CLI tools, developers can add Compressonator SDK to their own application using:
- CMP_Core components for encoding with BC1 to BC5 which supports processing with OpenCL™ and DirectX® Compute pipeline.
- Utilize the sample compute HLSL shader interfaces that use CMP_Framework with a CMP_GPU_DXC component library.
- Use CMP_Core BC1 to BC5 and BC7 encoders that use CMP_Framework with a CMP_GPU_OCL component library.
How Shader Compiling Is Setup
When using Compressonator GUI or CLI with GPU-based encoding options DXC or OCL for the first time, you will notice a long delay for each codec BC1 to BC7 been used. This is because each of these shader sources must be complied using D3D or OpenCL™ compiler. Once they are compiled, the shader binaries “
” files will be cached for quick use in subsequent usage. In case the compilation of these shaders fail to build on a certain GPU platform, the GUI or CLI tools will automatically switch to CPU encoding, which guarantees that processed results are always available.
Working With MIP Maps Using GPU Encoding
Requested MIP map level generation is dependent on the original image width and height dimensions, with each level being divisible by two from its direct predecessor. For MIP level generation with GPU, the resulting dimensions must also be divisible by four. Compressonator GUI and CLI applications will automatically adjust the user request levels with a warning message if a specific level cannot be achieved during processing.
Automated Make Compatible Feature
With the GPU or CLI tools, users can compress HDR and LDR images interchangeably between any BCn codec without any prior requirements to perform format transformations of the image source.
When using GPU based encoders both the GUI and CLI tools use temporary CPU based buffers to make the encoding seamlessly smooth. To see how this works, try compressing an EXR file format image to formats like BC1 to BC5 or BC7 using the “Encode with” option for GPU_DirectX or GPU_OpenCL.
If you are unfamiliar with how to process textures, check the online tutorial on getting started using sample projects.
Using Codec Quality Settings
In this release, the quality setting for both GPU and CPU has been modified based on which BCn codec is been used to process images.
BC1, BC2, and BC3
There are three-course settings, each uses a different algorithm for fast, medium, and slow processing speeds. Currently, the quality of each algorithm remains the same, in the next release this will change so that these codecs will have a much smoother combined range of settings from 0.0 to 1.0.
0.0 to 0.1
Fast processing, a min/max algorithm is used.
0.101 to 0.6
Medium processing, a minimal distance algorithm is used.
0.601 to 1.0
Slow processing, a ramp algorithm is used.
BC4 and BC5
The quality setting has no effect and performance remains the same. A simple packing algorithm is used and it does not have any refinement settings.
BC6 and BC7
Has full quality ranges from 0.0 to 1.0, and performance will vary from fast to very slow. The algorithms used here are more complex and are based on pre-calculated ramps and block partitions.
Using Global Quality Settings
In the GUI users can override all individual destination compression settings, using a global set value before processing. Currently, only the quality settings can be overwritten with a new global setting.
The process is as follows:
On the “Project Explorer”, click once on “Double Click here to add files…” to select its view options:
A new property view will be displayed:
Set a new “Quality” value to override all existing quality settings for textures used in the “Project Explorer” dialog. A value of zero will restore the old values and disable the global settings.
When an override is set, the quality settings displayed in the “Property View” for each destination setting in the “Project Explorer” will show a non-editable override setting.
Notice also that the “Double Click Here to add files” background color has also changed to indicate that an override setting is in effect. It will return to a white background if the override settings are turned off by setting its value to zero.
Once set, all texture processing will use the override value until it is turned off.
Viewing Performance Numbers
An updated table view of the GUI Analysis Views is provided that includes GPU encode performance statistics.
With these stats, users can analyze the data generated on specific GPU or CPU for performance vs image quality.
This estimates the time it takes to process 1000 (4×4 texel) blocks, using the current encoder and GPU setup. where Perf(ms) is the time it took to process a single block of 16 pixels in milliseconds.
HPC performance monitoring uses CPU timers, while OCL & DXC uses GPU performance query timers.
This is a measurement of the time it takes to process 1 Million texels in a second.
CPU performance-based timing, that measures the overall end to end time it took for the image to be processed. It includes device setup, loading image to GPU, receiving the image from GPU, and file IO.
Using the Log Format Option
Users can generate log files for GPU and CPU usage stats and test automation. The
logging format has changed to include KPerf(ms) and MTx/s. To accommodate these changes the revision for the new log format has been updated from 1.0 to 1.1 as shown in this sample (Bold items in the log file are new):
CompressonatorCLI -log -fd BC1 -EncodeWith OCL .\images\ruby.bmp ruby_bc1.dds
CompressonatorCLI Performance Log v1.1 Source : .\images\ruby.bmp, Height 416, Width 576, Linear size 0.936 MBytes Destination : ruby_bc1.dds Processed to : BC1 with 0 iteration(s) in 0.544 seconds Using : OCL Quality : 0.05 KPerf(ms) : 0.193 Mpx/s : 1242.000 MSE : 9.33 PSNR : 38.4 SSIM : 0.9848 Total time : 0.552 seconds
Generating a CSV Log File
In addition to the
two new command-line options have been added to output analysis data into a comma-separated file format.
to generate a
file suitable to use in any application that supports viewing these files in table formats like the one shown in this sample:
CompressonatorCLI -logcsv -fd BC1 -EncodeWith HPC .\images\ruby.bmp ruby_bc1.dds
CompressonatorCLI -logcsv -fd BC1 -EncodeWith OCL .\images\ruby.bmp ruby_bc1.dds