New sample application
Today, we are happy to announce the availability of a new D3D12SimpleClassify sample, which shows the use of a work graph in a simple frame-based graphics application.
The sample implements a trivial material classification and shading scenario. To achieve this, it uploads a 2D Texture containing integer values at each pixel location. This texture gets auto-generated at build-time by running a Python script on the input image, which is located at
material-ids.png in the project directory. The GPU runs a work graph containing two node arrays to process this input image. The first node array in the graph contains a single node, named
ClassifyPixels and uses the launch mode
Broadcast. This reads the input texture’s data and interprets each pixel value as a unique material ID, and then creates a node payload for the second node array, selecting the appropriate node in that array corresponding to the material ID. The second node array, named
ShadePixels, contains a node for each unique material ID and each node consumes an input record containing a pixel location, and writes out a unique color value at that location to the destination Texture UAV. The shaders in the
ShadePixels node array all use the
Coalescing launch type.
Finally, the destination Texture is presented to the display Window. Successful execution is determined by comparing the image displayed in the Window against the source image.
Here is a diagram for the Work Graph dispatched by this application:
Radeon GPU Profiler support for Work Graphs
We would also like to introduce support for GPU Work Graphs in the Radeon GPU Profiler or RGP. As you may be aware, in order to profile a DirectX or Vulkan application using RGP, the application needs to be frame-based. Now that we have both a DirectX and Vulkan frame-based Work Graphs sample available, it makes sense to talk about how you can use RGP with Work Graphs. Support for Work Graphs in RGP existed starting with the 1.15 version, which can be downloaded as part of the Radeon Developer Tool Suite right here on GPUOpen.
New event types
First up, there are new event types that can show up in RGP’s event list. The main new event type that you will see for a DirectX Work Graph application is the
SubDispatch event, as seen in the image below.
Each of these events represents a separate dispatch which originates from the GPU, rather than a normal dispatch that originates from a Dispatch (or similar) command recorded in a command buffer by the CPU. Similar to normal dispatches, each SubDispatch indicates the dispatch dimensions, letting you know how many work items are associated with each graph node. SubDispatches do not have an associated pipeline so the `API PSO hash’ and ‘Driver internal pipeline hash’ fields in the Details panel are shown as “N/A” for these events.
RGP also has support for showing GPU work associated with setting up the backing memory for a graph dispatch. This work is displayed as the new event type
InitGraphBackingStore. Below, you can see this event type in RGP’s event list. Since RGP only captures GPU work associated with a single frame, you will only see this event if an application initializes the backing memory in the frame which has been captured. This is more likely to happen if the application performs this initialization each frame, but as you can see in the below screenshot, the GPU work associated with InitGraphBackingStore can be expensive (shown here as over 1200 μs per frame). Thus, setting up the backing memory each frame may not lead to the most performant application. However, for the purposes of this demo, the D3D12SimpleClassify application can be run with a command line argument that will force it to set up the backing memory each frame. This allows us to easily see this work in RGP. To duplicate this, please run the sample with the
-AlwaysResetBackingStore command line argument.
For Vulkan apps that use the new VK_AMDX_shader_enqueue extension, the equivalent of the
SubDispatch event name seen in RGP is
vkCmdSubDispatch and the equivalent of the
InitGraphBackingStore event name is
Wavefront occupancy for Work Graphs
In addition to the new event types, RGP also shows the wavefront activity from the Work Graph. Below, we see a zoomed-in section of the Wavefront occupancy timeline, showing several SubDispatch events and the corresponding set of wavefronts launched from those dispatches.
If you’re interested in seeing the wavefront activity from individual SubDispatches, you can ask the Wavefront occupancy pane to color the wavefronts by event. This will give you a visual indicator of which waves come from which SubDispatch event.
There are a few features in RGP that are not fully functional for Work Graphs.
- RGP is currently unable to extract the disassembled ISA from shaders used in a work graph. Because of this, the ISA tab in the Pipeline state pane will be disabled for SubDispatch events.
- Because RGP does not have access to a graph node’s ISA, it is currently unable to support the Instruction timing feature. Thus, the Instruction timing pane in RGP will be blank for SubDispatch events.
- Similarly, the Cache Counter data displayed in RGP will be unavailable for parts of the timeline where graph dispatches are taking place.
You can still view ISA, Instruction timing and Counter data for non-Work Graph related events in an application that uses Work Graphs, and we plan to enable these features for SubDispatch events in future driver and RGP builds.
If you are experimenting with GPU Work Graphs in your applications and run into any issues with RGP or if you have ideas for ways RGP can be improved to provide better support for Work Graphs, please let us know via our GitHub Issues page.
To learn more about getting started with GPU Work Graphs, don’t miss our GPU Work Graphs primer!