Foreword

This is a guest post from Sebastian Aaltonen, co-founder of Second Order and previously senior rendering lead at Ubisoft®. Second Order published their first game — Claybook — in 2018, and it’s out now on Steam™, PlayStation 4™ and Xbox One™. Built using Unreal Engine 4, Sebastian wrote a post on optimising Claybook’s novel renderer for GPU occupancy and resource usage for GPUOpen early last year.

This post isn’t another one about the technology in the game, though. Instead, it covers something that any Unreal Engine 4 developer should be interested in: optimising building the engine and its asset production process on modern high-performance, highly multi-threaded workstations using AMD Ryzen Threadripper processors. The core concepts also apply to most game engine and asset generation systems in general, so whether you’re a UE4 developer or not, dig in to find out how Sebastian optimised his Threadripper system to help build their game.

Introduction

We built our first Ryzen Threadripper 1950X workstation one year ago, two months after AMD had released their new high-end desktop platform. We built a second Threadripper workstation a few months later after seeing how it could help speed up development. Now that we have used these two workstations for one full year to build our game, Claybook, it is time to talk about our experience with Threadripper.

New versus old

When we started Second Order Ltd 3 years ago, as a small 2-man studio, we bought the fastest consumer Intel® CPUs available. At that time, that meant the brand-new Core i7-6700K, which is a ‘Skylake’ processor with 4 cores and 8 threads, running at 4.0 GHz base clock and 4.2 GHz all-core turbo. This is still a great CPU for gaming, and Intel is still using much the same core microarchitecture in their new chips, including the latest Skylake-derived Core i9-9900K, codenamed Coffee Lake.

Unreal Engine documentation currently recommends a high core count Intel Xeon CPU if you are compiling UE4 from C++ source. This recommendation was spot on: compile time of Unreal Engine 4 with the Core i7-6700K was over 40 minutes.

I had formed good relations with AMD and other hardware vendors already when I was a senior rendering lead at Ubisoft, and AMD provided a Ryzen Threadripper 1950X to help us evaluate the new platform and see if could improve how we made our game. After spending some time with Threadripper to understand its strengths and weakness, and after tweaking the system, we managed to improve Unreal Engine 4 compile times by more than 3x. This was such a big improvement that we decided to buy another Threadripper system to improve the productivity for both of us.

Here are the components used in the systems used to generate the data below:

ProcessorsIntel Core i7-6700K (4C/8T, 4.0/4.2 base/turbo)
AMD Ryzen Threadripper 1950X (16C/32T, 3.4/4.0 base/turbo)
Memory4 x 8 GiB DDR4-2133
4 x 8 GiB DDR4-2400
4 x 8 GiB DDR4-2667
Storage2 x Samsung 850 PRO 512GB SSD
MainboardsASUS Z170 Pro Gaming (Core i7)
ASRock Taichi X399 (Ryzen Threadripper)

First UE4 C++ Engine Compile Comparison

First, we equipped our Ryzen Threadripper 1950X system with the existing DDR4-2133 memory DIMMs we harvested from one of the Core i7-based systems that was replaced by the Threadripper. It was time to re-compile the UE4 solution (which includes Claybook, so it’s the full game compile). Here’s what we found:

ProcessorThreadsTimePerformance
Intel Core i7-6700K640m 18s (2,418s)1x (baseline)
Intel Core i7-6700K837m 12s (2,232s)1.11x
AMD Ryzen Threadripper 1950X2414m 52s (892s)2.71x
AMD Ryzen Threadripper 1950X3214m 43s (883s)2.74x

We compared both CPUs with their maximum thread count, and also used a thread count that keeps the desktop and other applications highly responsive. When the Core i7 compiled UE4 with all 8 threads, the whole system became very unresponsive. Even the web browser and YouTube videos stutter. Therefore an 8-thread compile is mostly useful for compiling during lunch breaks or meetings. When you drop down the compiling thread count to 6, desktop stuttering is gone and you can use light applications such as the web browser while compiling. Thus, we selected the 6-thread compile on the Core i7 as our 1.00x baseline.

Ryzen Threadripper 1950X compiling using the full 32 threads still maintains a responsive desktop and web browser. You can compile and browse the web at the same time without any problems. If you drop the compile thread count to 24, you can also do heavy background tasks during compile without any problems. Dropping to 24 threads results in only a slightly longer compile time.

I have played multiple Overwatch competitive matches at 4K resolution during a UE4 rebuild on the Ryzen Threadripper 1950X system, and the game maintained 60 Hz frame lock at all times. This is excellent, especially if you are developing a CPU-heavy AAA game and wish to test your game during the C++ solution rebuild.

Core Utilization and Clocks

Unreal Engine uses a custom build tool called UBT that scales very well to high core counts. During the bulk compile, UBT occupies all available threads and manages to hit 100% CPU utilization on all cores. There are however some tasks at the beginning of the build process (notably UnrealHeaderTool) that are single threaded. When doing a full recompile, around 30 seconds of build time is spent in single threaded code.

Intel Core i7-6700K compiling with all 8 threads. Clock rate is steady at 4 GHz. No turbo clock is achieved. Dark blue is time spent in the kernel, light blue is time spent in user code.
AMD Threadripper 1950X compiling with all 32 threads. Clock rate is 3.67 GHz, +270 MHz turbo over the 3.4 GHz base clock. Dark blue is time spent in the kernel, light blue is time spent in user code.

Heatsinks and Noise

Surprisingly the Threadripper 1950X system is noticeably quieter than our Core i7 systems during max load.

Our Core i7 systems used Corsair H100i water coolers and both of our Threadripper systems were equipped with a Noctua NH-U14S air cooler. The Noctua has one large 140mm fan that rotates slower versus the two smaller 120mm fans on the Corsair H100i. The Noctua is able to keep the CPU temperature low even when batch compiling for all four of Claybook’s supported platforms in a row. It was definitely a surprising result. Noise was not at all a problem with Threadripper, assuming you have a large high-quality heatsink like the Noctua NH-U14S installed.

Optimizing the Threadripper

The Intel Core i7 is a consumer CPU and runs at maximum performance out of the box. Threadripper however requires more tweaking to configure it for the best possible workstation performance. By default, some Ryzen Threadripper BIOSes have settings that favor gaming instead of massively parallel C++, shader and asset compiling tasks that are very common in game development. Those tasks don’t usually share much data between threads, allowing us to enable NUMA mode to improve performance.

Threadripper is also very sensitive to DDR4 speed, even more so than on consumer Ryzen CPUs because of the system topology and how system RAM is connected to the processor. The higher DDR4 speed does a double duty on Threadripper: it also increases the speed of the Infinity Fabric interconnect between the active dice.

Unreal Engine: Enabling all compiler threads

The first thing you want to do is to configure UE4 to use our logical cores better. By default, UE4 sets the compile thread count to the number of physical cores on CPUs with up to 4 cores, and to 3/4 of all of the cores (including logical cores) for CPUs with more than 4. You can improve compile performance by increasing these values.

Modify UE4 BuildConfiguration.xml with your preferred settings:

<?xml version="1.0" encoding="utf-8" ?>
<Configuration xmlns="https://www.unrealengine.com/BuildConfiguration">
  <ParallelExecutor>
    <ProcessorCountMultiplier>2</ProcessorCountMultiplier>
    <MaxProcessorCount>30</MaxProcessorCount>
  </ParallelExecutor>
</Configuration>

In this example I enable SMT (Intel call that Hyperthreading) by setting ProcessorCountMultiplier to 2. This results in the compiler using all 32 hardware threads of the 16 physical core Threadripper 1950X processor. Then I set MaxProcessorCount to 30 to limit compile thread count to 30. This leaves 2 logical cores entirely free for other software, resulting in a smooth experience when multitasking during project compilation.

If your settings have been configured properly, you will get the following UE4 build process output:

Building XXX actions with 30 processes...

Memory

Ryzen Threadripper 1000 series processors officially support up to DDR4-2667 memory speed, and Threadripper 2000 series officially supports up to DDR4-2933 memory speed. You want to ensure that you buy fast enough memory to reach the official clock rates. We didn’t overclock our memory because stability is very important for professional work. Maximum memory speed can be reached by equipping the system with four identical DIMMs. This will enable quad-channel memory, which is essential for extracting the most performance out of Threadripper systems. If you equip more than four DIMMs the maximum supported memory frequency will be slightly lower, due to extra load on the memory subsystem.

Unreal Engine consumes around 0.7 GB of RAM per compiling thread. When you compile with all 32 threads of the Threadripper 1950X, memory usage peaks at around 21 GB. Shader compiler memory peak is similar. Thus, you should buy at least 32 GB of DDR4-2667 for a 32-thread system. If you don’t have enough memory, the whole system will crawl to a halt when the compiling processes are eating all your available memory. If you prefer to perform memory-heavy tasks during compilation, such as editing 16-bit Photoshop files with lots of layers or testing your game, I strongly recommend 64 GB.

Let’s take a look at the impact of memory clock on compile performance with the 1950X:

Processor / Memory SpeedThreadsTimePerformance
Intel Core i7-6700K640m 18s (2,418s)1x (baseline)
AMD Ryzen Threadripper 1950X @ DDR4-21333214m 43s (883s)2.74x
AMD Ryzen Threadripper 1950X @ DDR4-24003213m 56s (836s)2.89x
AMD Ryzen Threadripper 1950X @ DDR4-26673212m 53s (773s)3.13x

The result is clear. You save almost 2 minutes by using DDR4-2667 memory, versus DDR4-2133. Definitely buy the fastest memory DIMMs you can get when you are upgrading to Threadripper.

It’s worth repeating that Threadripper 2000 series CPUs officially support DDR4-2933. I would expect to see around one minute lower compile time based on memory speed alone. Add +100 MHz base clock, +400 MHz turbo clock and ~3% IPC bump and the Ryzen Threadripper 2950X seems like a solid upgrade over the Threadripper 1950X we used to build Claybook.

NUMA

By default, some Threadripper mainboards configure the processor to use UMA memory mode. This is better suited for gaming, and results in around 2% higher average gaming performance. But some new games already run better in NUMA mode, due to being more aware of what it means to run on a NUMA system.

You really want to enable NUMA mode in your Threadripper BIOS if it’s not enabled by default, because C++ and shader compilation doesn’t usually share any data between the threads, which has benefit on a NUMA system treated properly as such. Each compiling process is single threaded, but we have 32 of them running in parallel.

Enable NUMA by going to your BIOS and change Memory Interleaving to Channel . Consult your X399 mainboard BIOS manual for the correct setting, but it’s usually called that.

Let’s see what impact that has on performance:

Processor / Memory SpeedThreadsTimePerformance
Intel Core i7-6700K640m 18s (2,418s)1x (baseline)
AMD Ryzen Threadripper 1950X @ DDR4-26673212m 53s (773s)3.13x
AMD Ryzen Threadripper 1950X @ DDR4-2667 + NUMA3212m 6s (726s)3.33x

We see another 47 second reduction in compile time by enabling NUMA. Now we are compiling at 3x faster than our starting point (Core i7 with 8 threads). This is an expected result, since we are comparing a 4 GHz 4-core processor with a 3.67 GHz 16-core processor. Threadripper 1950X has 4x the core count, but slightly lower clock rate and slightly lower IPC. The brand new Threadripper 2950X model improves both, likely getting us closer to the 4x theoretical performance gain.

Shader Compile and Data Cooking

Unreal Engine editor has a worker thread system for compiling shaders and cooking data. We started Claybook development using UE4 4.8 and ended up with UE4 4.21. During this time, we merged almost each major version. After each finished merge we cleaned the compiled data directories (DerivedDataCache, Intermediate, Saved) to ensure that no deprecated data exists. The first UE4 editor startup after a full merge took 30+ minutes on the Core i7-6700K. We didn’t use UE4 shader node graphs that much, but we still ended up with over 2000 shader permutations. Threadripper made the merge process much faster.

Cooking data for package builds is another task that will take a long time. When we iterated on the Xbox and PlayStation 4 console certification build candidates, we prepared close to one hundred internal package builds in total. The package build script fully recompiles the source code for that platform and packages the data. Threadripper was a huge help.

Ryzen Threadripper 1950X compiling all of Claybook’s shaders for the Xbox One target. Almost 100% all-core utilisation. 3.52 GHz clock rate, which is +120 MHz turbo over the base clock. Dark blue is time spent in the kernel, light blue is time spent in user code.

HPET timer woes

One of the clean Windows 10 Pro installs on our ASRock Taichi X399 mainboard had stuttering problems after installing the Windows 10 Meltdown and Spectre patches, and the X399 platform drivers. I installed and upgraded Windows before I updated the mainboard BIOS, which could have caused the following issue.

After some investigation, I concluded that stuttering problems affected programs that had lots of timer calls. The UE4 development editor frame rate was halved, because its code base has lots of timing brackets. Benchmarking software such as Samsung Magician SSD speed tester and Intel VTune were also slowing the system to halt.

I got some help from other Threadripper users and AMD and tracked the issue down to use of the HPET timer. If you believe your system is suffering from this problem, run the following command in an Administrator command prompt: bcdedit . This command will show you some system boot info. If see the following line, you are affected by this issue: Useplatformclock Yes .

The fix is simple, run this command line and reboot your computer: bcdedit /deletevalue useplatformclock . If you want to restore the HPET timer for some reason, you can run this command line: bcdedit /set useplatformclock true .

You need to execute the deletevalue command once and then the stuttering issue should be gone for good. AMD has confirmed that the HPET timer should be disabled by default on Threadripper, and I would assume that new Windows 10 versions and new mainboard BIOS versions fix this issue. After running this command, our Threadripper system with ASRock mainboard performs as well as our second Threadripper system with an ASUS mainboard.

Conclusion

It’s good to see strong competition between vendors in the x86 CPU market again. Ryzen Threadripper is a welcome new option for the high-performance workstation market. It maps especially well to game development workloads, and it has the best cost/performance ratio for game development in the market today.

As a new platform there were some teething troubles and tweaks needed to extract maximum performance, but once those were figured out the platform and processor were able to deliver most of the on-paper promise of a system with 16 CPU cores with SMT and quad-channel memory. Remember: if you’re putting your own Threadripper system together for game development, equip your system with fast DDR4 memory, and ensure that you purchase at least 1 GB per thread if you use Unreal Engine 4.

The benefits of using Threadripper for game development are clear: huge speedups in common game development tasks like compiling engine code, shaders and building assets, all while providing a responsive system you can keep using for other tasks while you’re doing that.

You can check out the game we built at www.claybookgame.com. It’s out today on Steam, PlayStation 4 and Xbox One, go check it out!

Other posts by Sebastian Aaltonen