
Vulkan® Memory Allocator
VMA is our single-header, MIT-licensed, C++ library for easily and efficiently managing memory allocation for your Vulkan® games and applications.
This tutorial serves as a guide on how to best use the various Memory Heaps and Memory Types exposed in Vulkan on AMD drivers, starting with some high-level tips.
A great reminder from Axel Gneiting (leading Vulkan implementation in DOOM® at id Software), is to make sure to pool a group of resources, like textures and buffers, into a single memory allocation. On Windows® 7 for example, Vulkan memory allocations map to WDDM Allocations (the same lists seen in GPUView), and there is a relatively high cost associated for a WDDM Allocation as command buffers flow through the WDDM based driver stack. Having 256 MB per DEVICE_LOCAL allocation can be a good target, takes only 16 allocations to fill 4 GB.
When an application starts over-subscribing GPU-side memory, DEVICE_LOCAL memory allocations will fail. It is also possible that later during application execution, another application in the system increases its usage of GPU-side memory, resulting in dynamic over-subscribing of GPU-side memory. This case can result in an OS (for instance Windows® 7) to silently migrate or page GPU-side allocations to/from CPU-side as it time-slices execution of each application on the GPU. This can result in visible “hitching”. There is currently no method to directly query if the OS is migrating allocations in Vulkan. One possible workaround is for the app to detect hitching by looking at time-stamps, and then actively attempting to reduce DEVICE_LOCAL memory consumption when hitching is detected. For example, the application could manually move around resources to fully empty DEVICE_LOCAL allocations which can then be freed.
When targeting a memory surplus, using DEVICE_LOCAL+HOST_VISIBLE for CPU-write cases can bypass the need to schedule an extra copy. However in memory constrained situations it is much better to use DEVICE_LOCAL+HOST_VISIBLE as an extension to the DEVICE_LOCAL heap and use it for GPU Resources like Textures and Buffers. CPU-write cases can switch to HOST_VISIBLE+COHERENT. The number one priority for performance is keeping the high bandwidth access resources in GPU-side memory.
Driver Device Memory Heaps and Memory Types can be inspected using the Vulkan Hardware Database. For Windows AMD drivers, below is a breakdown of the characteristics and best usage models for all the Memory Types. Heap and Memory Type numbering is not guaranteed by the Vulkan Spec, so make sure to work from the Property Flags directly. Also note memory sizes reported in Vulkan represent the maximum amount which is shared across applications and driver.
vkAllocateMemory()
allocation is a good starting point for collections of buffers and imagesvkMapMemory()
to map into Host system address spacevkAllocateMemory()
allocationvkMapMemory()
to map into Host system address spacevkMapMemory()
Choosing the correct Memory Heap and Memory Type is a critical task in optimization. A GPU like Radeon™ Fury X for instance has 512 GB/s of DEVICE_LOCAL bandwidth (sum of any ratio of read and write) but the PCIe bus supports at most 16 GB/s read and at most 16 GB/s write for a sum of 32 GB/s in both directions.