Announcing Vulkan Memory Allocator 3.0.0 and Direct3D 12 Memory Allocator 2.0.0

Adam Sawicki

Originally posted March 25, 2022

Vulkan® Memory Allocator

When we first published version 1.0.0 of the Vulkan® Memory Allocator library in July 2017, we didn’t expect it to be so successful and widely used. It started as an R&D project in the AMD Game Engineering team and is still developed here, while our main job is helping game developers in optimizing their games to reach optimal stability, correctness, performance, and feature set on AMD hardware. You can learn our daily work from the article “Opinion – International Women’s Day – Working at AMD DevTech: Graphics Cards, Chips and Games” by our colleague Lou Kramer.

Thanks to the support from the developer community, over the years, VMA became the de facto standard among Vulkan developers, as it helps to solve a problem that each of them has to face: allocating device memory blocks and sub-allocating parts of them for buffers and images. In our team we focus mainly on PC and Windows®, but the library is written in pure C++, so it is compatible with any platform and compiler where Vulkan is available, whether it is Windows, Linux, MacOS, or Android. It also works with any Vulkan-supporting GPU, whether a discrete graphics card, graphics integrated with the processor, or a mobile SoC.

As of the date of publishing, the project has been starred by 1.5k users, forked almost 200 times, and has had more than 240 issues and pull requests opened so far, most of them already resolved. Their number is growing every week. GitHub currently shows 47 contributors who proposed some smaller or larger fixes to the code. Vulkan Memory Allocator has also been selected as the number one third-party “Vulkan ecosystem tool/layer (not included in the SDK)” with the most people calling it “Very Useful” in the results of the 2021 Vulkan Ecosystem and SDK Survey.

Introducing Vulkan Memory Allocator 3.0.0

Today, we would like to announce the release of a new major version of the library: 3.0.0.

We have worked very hard in the past few months to finish new additions and improvements to the library. Some of these have been in the works for months and years. Incrementing major version number allowed us to make some larger changes that required breaking backward compatibility. Below you can find an overview of the most notable changes in this new version. A lot of this new code has been developed by our intern, Marek Machliński.

There are few things that haven’t changed with this release though. The library is still a single, “STB-style” header file that you can just include in your project. It is still fully open source, available under the permissive MIT license. It is always kept in a good state, with continuous integration set up for Windows and Linux, so you can use the latest code version from the “master” branch. Finally, it is still in continuous development, with bug fixes and improvements pushed to GitHub regularly.

Documentation is also always kept up-to-date – both the description of the API functions and structures, as well as general chapters. You can find this documentation in the Doxygen-format comments inside the library code. You can generate HTML or some other format out of it yourself, or just browse it online at: Vulkan Memory Allocator.

Let’s take a look at some of the new features in this release.

New memory usage flags

Various types of GPUs expose various collections of memory heaps and types in the Vulkan API. Help with selection of the optimal memory type is one of the main features of the library. Up until now, developers have had to use flags like VMA_MEMORY_USAGE_CPU_TO_GPU or VMA_MEMORY_USAGE_CPU_ONLY, which were quite confusing, however in 3.0.0 we have redesigned these flags. You are now encouraged to just use VMA_MEMORY_USAGE_AUTO and let VMA choose the optimal memory type for you. Creating buffers and textures is now easier than ever!

For example, if you need a buffer that will be filled with a copy operation and then used as a vertex buffer, creating it is simply one function call:

				
					VkBufferCreateInfo bufInfo = {VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO};
bufInfo.size = ...
bufInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT;

VmaAllocationCreateInfo allocInfo = {};
allocInfo.usage = VMA_MEMORY_USAGE_AUTO;

VkBuffer buf;
VmaAllocation alloc;
vmaCreateBuffer(allocator, &bufInfo, &allocInfo, &buf, &alloc, nullptr);

If you want to be able to map an allocation to access its data via CPU-side pointer, you now need to additionally specify one of the flags: allocInfo.flags = VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT or VMA_ALLOCATION_CREATE_HOST_ACCESS_RANDOM_BIT.

You could say it makes things more complicated, as it requires two flags instead of just one, but we think it better reflects what happens under the hood. By knowing the required kind of access on the CPU (VMA_ALLOCATION_CREATE_HOST_ACCESS_ flags) and on the GPU (by inspecting VmaBufferCreateInfo::usage or VmaImageCreateInfo::usage), the library has enough information to automatically select the best memory type for the new resource. Additionally, the library can now distinguish mappable from non-mappable allocations, which enables it to keep them separate and better optimize in some cases, especially on platforms with integrated graphics and unified memory.

Old VMA_MEMORY_USAGE_ values still exist in the API and work as before, so you don’t need to upgrade all your code to the new ones right now, but it is recommended. You can find more information about these new flags in the documentation chapters: “Choosing me mor y type” and “Recommended usage patterns”.

Powerful custom memory pools

Allocating all kinds of resources, big and small, out of default memory pools is recommended for most cases, but the library also supports custom pools. The feature set of custom pools has been significantly extended in VMA 3.0.0. Now they not only allow you to keep some kind of resources separate, reserve some minimum memory or limit the maximum amount of memory they can take, but they also allow you to pass additional parameters to all allocations made in a pool, some of them unavailable when making normal allocations.

These additional parameters include:

VmaPoolCreateInfo::priority – this allows you to specify the priority which will be set automatically on all the pool allocations when the “VK_EXT_memory_priority” extension is enabled.
minAllocationAlignment – this allows you to enforce additional, minimum alignment for allocations.
pMemoryAllocateNext – this offers an opportunity to add an additional structure to be attached to VkMemoryAllocateInfo::pNext chain (for example VkExportMemoryAllocateInfoKHR).

The last two features were added by request from the professional graphics community, which needs Vulkan-OpenGL interop.

Custom pools now also support dedicated allocations, so even if you need each one of your special-purpose buffers and images to have its own device memory block, you can still use the custom pools feature to pass extra allocation parameters to them. For more information, see the documentation chapter “Custom memory pools”.

New defragmentation API

If you develop an open-world game with buffers and textures of various sizes streamed in and out of GPU memory at runtime, you may experience fragmentation. This is an unwanted situation in which there are a lot of memory blocks which are almost empty but still hold a few allocations scattered in various places with small empty regions between them. In this scenario these empty regions are not big enough to fit a new, big allocation.

In such a situation VMA provides few defragmentation algorithms to mitigate the problem. During the evolution of VMA, we have had several approaches to the defragmentation API, but it has been a difficult problem because VMA is a low-level library and cannot perform full defragmentation on its own. It needs the user’s cooperation to recreate resources and copy actual data.

VMA 3.0.0 offers a new API for defragmentation. It enables you to defragment a custom memory pool or all the default pools. The defragmentation process can either be performed iteratively, with limited number of allocations or bytes moved in a single pass, or asynchronously, using VMA from multiple threads.

A choice of defragmentation algorithms is available, from FAST, which will only move the most obvious allocations to free memory blocks that are almost empty, through BALANCED, FULL, and up to EXTENSIVE which arranges linear and optimal resources separately to avoid buffer-image granularity conflicts. For more information, see documentation chapter “Defragmentation”.

New statistics and budget API

It is recommended that you keep track of how many resources are allocated and how much memory they occupy, not only for statistical and debugging purposes, but also to be sure you are not exceeding the available memory budget, as this could cause significantly reduced performance or even application crash. Over the years of incremental development and preserving backward compatibility, VMA evolved a few different structures and functions for obtaining memory usage statistics. Incrementing the major version gave us the opportunity to break backward compatibility in order to clean and redesign the API.

Statistics are now grouped into several “levels of detail”. The most basic structure VmaStatistics provides total number of allocations, memory blocks, as well as the number of bytes these allocations and blocks occupy. VmaStatistics is guaranteed to be fast to calculate. It can be calculated for a custom pool or for the entire allocator. In the latter case, it is grouped into structures called VmaBudget, one per each Vulkan memory heap, to also provide current memory usage and the budget available to the application in that heap, as reported by ” VK_EXT_memory_budget" extension. If you enable this extension and inform VMA about it using the proper VmaAllocatorCreateFlagBits flag, the library automatically uses this extension. If not, you can still use the same API, but memory usage and budget will be estimated based on internal library statistics and Vulkan memory heap sizes.

More detailed statistics are available in form of the structure VmaDetailedStatistics. They can be “calculated” (as opposed to “get”) for a specific custom heap or the entire allocator. In the latter case, they are grouped into the structure VmaTotalStatistics – per memory type, memory heap, and total. They contain additional information like the number of free regions between allocations, minimum and maximum allocation size and so on. However, they do need to traverse some internal data structures, so they may be slower to calculate and should only be used for debugging purposes.

Finally, the internal state of the library, including full list of allocations and their custom string names, can be dumped into a JSON document. This hasn’t changed since previous version, although the specific JSON format has. The library also offers a Python script that visualizes this JSON file as a pretty picture. This script can be found in the repository at ” tools\VmaDumpVis\VmaDumpVis.py “. For more information, see the documentation chapter “Statistics”.

Virtual allocator

When developing your Vulkan application, it is recommended that you allocate one or few large buffers and then sub-allocate parts of them for different pieces of data rather than creating a separate buffer for each one of them. If they are of different sizes, created and freed in some random order, this second-level allocation also needs a fully-featured allocator. Why not use the same one we have already implemented?

To fulfil this need, VMA now exposes its core allocation algorithm in form of a “virtual allocator”, which can be used to allocate any pieces of memory, whether it’s part of one large Vulkan buffer or something completely unrelated to Vulkan and the GPU. You don’t even need to initialize Vulkan and create the main allocator object to use this feature. All you need to do is to create a VmaVirtualBlock object and allocate pieces of its virtual address space to obtain lightweight handles of type VmaVirtualAllocation. For more information, see the documentation chapter “Virtual allocator”.

New allocation algorithm

Last but not least, we have completely rewritten the allocation algorithm. Changes in the public API of the library require careful documentation, but sometimes the most important changes are invisible on the surface. Flexible and modular internal architecture allowed us to experiment and prototype until we decided that our implementation of the Two-Level Segregated Fit (TLSF) algorithm is good enough to replace the old algorithm as the default used everywhere through the library, including in default pools, custom pools, and the virtual allocator.

This algorithm has some great properties that make it much faster than the old one. When allocating GPU memory blocks, creating Vulkan buffers and images, the time spent searching for the best place for a new allocation was rarely a performance bottleneck, but we were receiving some signals that the performance overhead was significant in various cases. For example, when freeing lots of allocations at once or when allocating small buffers on platforms with a large bufferImageGranularity Vulkan limit. With the new TLSF algorithm, these performance problems should be gone now.

Other changes

There are many other changes that have made it into the library throughout the years of development, far too many to describe them all here. Besides a few additions to the API, countless improvements were made to the implementation internals, including bug fixes, performance optimizations, code refactoring, and ensuring compatibility with various platforms, compilers, and GPUs. Many of them were proposed by the developer community as issues and pull requests on GitHub.

Incrementing the major version number and breaking backward compatibility allowed us to not only add and redesign elements of the library interface, but also remove a few of them. We decided to abandon support for some features which were not widely used, but were adding much complexity to the library code, required constant testing and maintenance. Removing them allowed to simplify and shorten the code significantly. The biggest features that went away are “record & replay” and “lost allocations”.

You can find a more formal list of changes in VMA 3.0.0 compared to the previous release on the page: VMA 3.0.0 release.

Direct3D®12 Memory Allocator

D3D12MA, the younger sister of VMA, offers similar features for the developers using Direct3D 12 to render graphics. Just like VMA, it is available on GitHub under MIT license and offers documentation generated from Doxygen-style comments, which is browsable online here: D3D12 Memory Allocator.

Reaching feature parity with VMA was our goal, but we are not going to merge these two libraries into one. There are just too many differences between Vulkan and DX12 regarding memory management. The API of the library is also different in order to better blend with the graphics API. For example, in VMA the whole interface is in C with global functions like vmaCreateImage and opaque handles like VmaAllocation, while D3D12MA interface is object-oriented, with functions like Allocator::CreateResource, to resemble ID3D12Device::CreateCommittedResource. You can even use your favourite smart pointers for COM objects to hold references to the objects of this library.

Having the main Allocator object created, allocation of a texture or a buffer is almost as simple as when you allocate all of them as committed, with the library handling all of the complexity, including allocating a large ID3D12Heap and creating placed resources in it, respecting D3D12_RESOURCE_HEAP_TIER, applying “small texture alignment” optimization where applicable, and so on.

An example of this is shown below:

				
					D3D12_RESOURCE_DESC resDesc = ...

D3D12MA::ALLOCATION_DESC allocDesc = {};
allocDesc.HeapType = D3D12_HEAP_TYPE_DEFAULT;

D3D12Resource* res;
D3D12MA::Allocation* alloc;
HRESULT hr = allocator->CreateResource(&allocDesc, &resDesc,
    D3D12_RESOURCE_STATE_COPY_DEST,
    NULL, &allocation, IID_PPV_ARGS(&res));

Introducing D3D12MA 2.0.0

Today, we are also announcing a new major release of D3D12 Memory Allocator, version 2.0.0.

So much has changed since the first release that it doesn’t make much sense to compare the differences. Let’s focus on the features that the library now provides.

Just like VMA, the D3D12MA also now offers powerful custom pools. They give an opportunity to not only keep certain resources together, reserve some minimum or limit the maximum amount of memory they can take, but also to pass additional allocation parameters unavailable to simple allocations. Among them, probably the most interesting is POOL_DESC::HeapProperties, which allows to specify parameters of a custom memory type, which may be useful on UMA platforms. Committed allocations can now also be created in custom pools.

The API for statistics and budget has been redesigned similarly to VMA. The structure Statistics offers basic statistics that are fast to calculate. This is available for a custom pool as well as the entire allocator object. In the latter case, they are returned indirectly via the Budget structure that also carries current memory usage and available budget, as reported by IDXGIAdapter3::QueryVideoMemoryInfo, separately for local (GPU) and non-local (system) memory. More detailed statistics can be calculated and returned in form of the structure DetailedStatistics and TotalStatistics. D3D12MA also supports JSON dump that can list all the allocations, their types, sizes, and custom string names. Additionally, the repository also contains a Python script to visualize this JSON file as a picture.

The library also exposes its core allocation algorithm via an interface called a “virtual allocator”. This can be used to allocate pieces of custom memory or whatever you like, whether it’s one big D3D12 buffer to hold all the bottom level acceleration structures used for ray tracing, individual descriptors and their groups (descriptor tables) in a ID3D12DescriptorHeap, or something completely unrelated to graphics. The allocation algorithm has also been replaced with the new, more efficient TLSF.

Finally, we added support for defragmentation to D3D12MA, just like we did in VMA.