
AMD Open Capture and Analysis Tool (OCAT)
If you want to know how well a game is performing on your machine in real-time with low overhead, AMD OCAT has you covered.
Performance Guide
Our AMD Ryzen™ Performance Guide will help guide you through the optimization process with a collection of tidbits, tips, and tricks which aim to support you in your performance quest.
PresentMon is a Command Line Interface (CLI) tool for logging frame times such as MsBetweenPresents
.
Example:
PresentMon-1.6.0-x64.exe -process_name "MyGame.exe"-stop_existing_session-terminate_on_proc_exit-terminate_after_timed-timed 60-output_file "%CD%\result\presentmon.csv"
OCAT is a Graphics User Interface (GUI) tool with hot key support for logging frame times based on PresentMon.
WPA is a highly configurable tool for finding system performance bottlenecks and ideal for filtering and visualizing call stacks.
wpr.exe
or xperf.exe
.wpr.exe
is included in all Windows 10 installations.xperf.exe
is included in the Windows SDK.GPUView is a tool for analyzing GPU performance with regard to direct memory access (DMA) buffer processing.
You can use the Concurrency Visualizer for Visual Studio to locate performance bottlenecks, CPU underutilization, thread contention, cross-core thread migration, synchronization delays, DirectX activity, areas of overlapped I/O, and other information.
RGP is an offline compiler and performance analysis tool for DirectX, Vulkan®, SPIR-V™, OpenGL® and OpenCL™.
Build.h
, #define FORCE_USE_STATS
and #define STATS
should never be enabled during Shipping builds.ALLOW_CONSOLE_IN_SHIPPING
during game development.Run Unreal Engine UE4Editor MapCheck to find errors.
Use Unity® AssetPostprocessor to enforce minimum standards.
DAMAGE CAUSED BY USE OF YOUR AMD PROCESSOR OUTSIDE OF SPECIFICATION OR IN EXCESS OF FACTORY SETTINGS ARE NOT COVERED UNDER YOUR AMD PRODUCT WARRANTY AND MAY NOT BE COVERED BY YOUR SYSTEM MANUFACTURER’S WARRANTY. Operating your AMD processor outside of specification or in excess of factory settings, including but not limited to overclocking, may damage or shorten the life of your processor or other system components, create system instabilities (e.g. data loss and corrupted images), and in extreme cases may result in total system failure. AMD does not provide support or service for issues or damages related to use of an AMD processor outside of processor specifications or in excess of factory settings.
bcdedit.exe /deletevalue useplatformclock
bcdedit.exe /set useplatformclock yes
rem Run as administratorrem Disable Steam Shader Pre-Caching before running this scriptrem Reboot after running this script to clear any shaders still in system memory
setlocal enableextensionscd /d "%~dp0"rmdir /s /q "%LOCALAPPDATA%\D3DSCache"rmdir /s /q "%LOCALAPPDATA%\AMD\DxCache"rmdir /s /q "%LOCALAPPDATA%\AMD\GLCache"rmdir /s /q "%LOCALAPPDATA%\AMD\VkCache"rmdir /s /q "%ProgramData%\NVIDIA Corporation\NV_Cache"rmdir /s /q "%ProgramFiles(x86)%\Steam\steamapps\shadercache"
Hypervisor-Protected Code Integrity (HVCI) is labelled Memory Integrity in the Windows Security app.
The symstore and symbol path can be powerful tools for loading vendor symbols and providing hints to tools which do not check the local directory.
_NT_SYMBOL_PATH
.
_NT_SYMBOL_PATH=cache*c:\symbols;srv*https://download.amd.com/dir/bin;srv*https://driver-symbols.nvidia.com/;srv*http://msdl.microsoft.com/download/symbols
“C:\Program Files (x86)\Windows Kits\10\Debuggers\x64”
to the PATH
is recommended.symstore.exe add /r /f *.pdb /s c:\symbols /t "MyProject"
Typically, the application is CPU-bound if GPU Idle > 5%
Look for bubbles of idle work on the GPU in tools such as RGP, GPUView, and the Visual Studio Concurrency Visualizer.
There are multiple tools and methods available for developers to detect boundedness:
Radeon GPU Profiler (RGP)
rem run as administratorrem add "C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\gpuview" to pathsetlocal enableextensionscd /d "%~dp0"rem switch active foreground window back to the game applicationtimeout.exe /t 5call log.cmd lighttimeout.exe /t 5call log.cmdrem open Merged.etl
rem run as administratorsetlocal enableextensionscd /d "%~dp0"rem switch active foreground window back to the game applicationtimeout.exe /t 5wpr.exe -start gpu -filemodetimeout.exe /t 5wpr.exe -stop out.etlrem open out.etl
Command | Recommended Value |
---|---|
r.rhicmdbypass | 0 |
r.rhicmdusedeferredcontexts | 1 |
r.rhicmduseparallelalgorithms | 1 |
r.rhithread.enable | 1 |
Use a cold shader cache while verifying parallel DX12 pipeline state creation.
PATH
.rem run as administratorrem clear shader cachecall log.cmdrem collect samples while game is starting and calling D3D12.dll!CDevice::CreatePipelineStatecall log.cmd
etl
log file with the Windows Performance Analyzer.D3D12.dll!CDevice::CreatePipelineState
within the Flame by Process, Stack.This find command highlights the samples of interest in the CPU Usage (Precise) graph:
PATH
.rem run as administratorrem add "C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\gpuview" to pathsetlocal enableextensionscd /d "%~dp0"rem switch active foreground window back to the game applicationtimeout.exe /t 5call log.cmdrem collect samples while game is playing and rendering frames. 1 seconds should be more than enough data.timeout.exe /t 1call log.cmd
WinDbg may be used for setting breakpoints, logging, skipping functions, editing memory, or editing registers.
RCX
, RDX
, R8
, and R9
. Arguments five and higher are passed on the stack.
steam_appid.txt
file or SteamAppId
system environment variable to launch an executable from WinDbg.DXGI_GPU_PREFERENCE_HIGH_PERFORMANCE
was used:
DXGI_GPU_PREFERENCE_HIGH_PERFORMANCE (2)
is recommended for optimal performance on hybrid graphics systems.bp dxgi!CDXGIFactory::EnumAdapterByGpuPreference ".printf \"FOUND DXGIFactory::EnumAdapterByGpuPreference DXGI_GPU_PREFERENCE=%x\\n\",@r8"
GetLogicalProcessorInformation(Ex)
calls with non-zero input buffer lengths return success:
0
to get the buffer length to malloc.return 1
).bp kernelbase!GetLogicalProcessorInformation "bp /1 @$ra \".printf \\\"GetLogicalProcessorInformation returned %i\\\", @rax; .echo; g\"; .printf \"GetLogicalProcessorInformation input buffer length 0x%x\", poi(@rdx); .echo; g"bp kernelbase!GetLogicalProcessorInformationEx "bp /1 @$ra \".printf \\\"GetLogicalProcessorInformationEx returned %i\\\", @rax; .echo; g\"; .printf \"GetLogicalProcessorInformationEx input buffer length 0x%x\", poi(@r8); .echo; g"
The DirectX APIs refer to Accelerated Processing Units (APUs) or Integrated Graphics parts via the term Unified Memory Architecture (UMA).
bool isUMA(ID3D12Device* pDevice){ bool result = false; D3D12_FEATURE_DATA_ARCHITECTURE data = {}; if (S_OK == pDevice->CheckFeatureSupport( D3D12_FEATURE_ARCHITECTURE, &data, sizeof(data))) { result = data.UMA; } return result;}
//// Copyright (c) 2021 Advanced Micro Devices, Inc. All rights reserved.//// Permission is hereby granted, free of charge, to any person obtaining a copy// of this software and associated documentation files (the "Software"), to deal// in the Software without restriction, including without limitation the rights// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell// copies of the Software, and to permit persons to whom the Software is// furnished to do so, subject to the following conditions://// The above copyright notice and this permission notice shall be included in// all copies or substantial portions of the Software.//// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN// THE SOFTWARE.//
#include <iostream>#include <dxgi1_4.h>#include <d3d12.h>
#pragma comment( lib, "dxgi" )#pragma comment( lib, "d3d12" )
bool isUMA(ID3D12Device* pDevice){ bool result = false; D3D12_FEATURE_DATA_ARCHITECTURE data = {}; if (S_OK == pDevice->CheckFeatureSupport( D3D12_FEATURE_ARCHITECTURE, &data, sizeof(data))) { result = data.UMA; } return result;}
int main(){ ID3D12Device* pDevice = nullptr; if (SUCCEEDED(D3D12CreateDevice( NULL, D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), (void**)&pDevice))) { IDXGIFactory* pFactory; IDXGIFactory4* pFactory4; if (SUCCEEDED(CreateDXGIFactory(__uuidof(IDXGIFactory), (void**)(&pFactory))) && SUCCEEDED(pFactory->QueryInterface(__uuidof(IDXGIFactory4), (void**)&pFactory4))) { LUID luid = pDevice->GetAdapterLuid(); IDXGIAdapter* pAdapter; DXGI_ADAPTER_DESC desc; if (SUCCEEDED(pFactory4->EnumAdapterByLuid(luid, __uuidof(IDXGIAdapter), (void**)&pAdapter)) && SUCCEEDED(pAdapter->GetDesc(&desc))) { printf("DedicatedVideoMemory %I64u\n", desc.DedicatedVideoMemory); printf("DedicatedSystemMemory %I64u\n", desc.DedicatedSystemMemory); printf("SharedSystemMemory %I64u\n", desc.SharedSystemMemory); printf("isUMA %i\n", isUMA(pDevice)); SIZE_T budget = desc.DedicatedVideoMemory; if (isUMA(pDevice)) { budget += desc.DedicatedSystemMemory + desc.SharedSystemMemory; } IDXGIAdapter3* pAdapter3 = nullptr; DXGI_QUERY_VIDEO_MEMORY_INFO info = {}; if (SUCCEEDED(pAdapter->QueryInterface(__uuidof(IDXGIAdapter3), (void**)&pAdapter3)) && SUCCEEDED(pAdapter3->QueryVideoMemoryInfo(0, DXGI_MEMORY_SEGMENT_GROUP_LOCAL, &info))) { budget = info.Budget; } printf("budget %I64u\n", budget); } } }}
bool isUMA(ID3D11Device* pDevice){ bool result = false; ID3D11Device3* pD3D11Device3 = nullptr; if (S_OK == pDevice->QueryInterface(IID_PPV_ARGS(&pD3D11Device3)) && pD3D11Device3) { D3D11_FEATURE_DATA_D3D11_OPTIONS2 data = {}; if (S_OK == pD3D11Device3->CheckFeatureSupport( D3D11_FEATURE_D3D11_OPTIONS2, &data, sizeof(data))) { result = data.UnifiedMemoryArchitecture; } pD3D11Device3->Release(); } return result;}
//// Copyright (c) 2021 Advanced Micro Devices, Inc. All rights reserved.//// Permission is hereby granted, free of charge, to any person obtaining a copy// of this software and associated documentation files (the "Software"), to deal// in the Software without restriction, including without limitation the rights// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell// copies of the Software, and to permit persons to whom the Software is// furnished to do so, subject to the following conditions://// The above copyright notice and this permission notice shall be included in// all copies or substantial portions of the Software.//// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN// THE SOFTWARE.//
#include <iostream>#include <dxgi1_4.h>#include <d3d11_3.h>
#pragma comment( lib, "dxgi" )#pragma comment( lib, "d3d11" )
bool isUMA(ID3D11Device* pDevice){ bool result = false; ID3D11Device3* pD3D11Device3 = nullptr; if (S_OK == pDevice->QueryInterface(IID_PPV_ARGS(&pD3D11Device3)) && pD3D11Device3) { D3D11_FEATURE_DATA_D3D11_OPTIONS2 data = {}; if (S_OK == pD3D11Device3->CheckFeatureSupport( D3D11_FEATURE_D3D11_OPTIONS2, &data, sizeof(data))) { result = data.UnifiedMemoryArchitecture; } pD3D11Device3->Release(); } return result;}
int main(){ UINT flags = NULL; // D3D11_CREATE_DEVICE_SINGLETHREADED; D3D_FEATURE_LEVEL featureLevels[] = { D3D_FEATURE_LEVEL_11_0 }; UINT numFeatureLevels = ARRAYSIZE(featureLevels); D3D_FEATURE_LEVEL featureLevel; ID3D11Device* pDevice = nullptr; ID3D11DeviceContext* pImmediateContext = nullptr; if SUCCEEDED(D3D11CreateDevice( NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, flags, featureLevels, numFeatureLevels, D3D11_SDK_VERSION, &pDevice, &featureLevel, &pImmediateContext)) { IDXGIDevice* pDXGIDevice = nullptr; IDXGIAdapter* pAdapter = nullptr; DXGI_ADAPTER_DESC desc; if (SUCCEEDED(pDevice->QueryInterface(__uuidof(IDXGIDevice), (void**)&pDXGIDevice)) && SUCCEEDED(pDXGIDevice->GetAdapter(&pAdapter)) && SUCCEEDED(pAdapter->GetDesc(&desc))) { printf("DedicatedVideoMemory %I64u\n", desc.DedicatedVideoMemory); printf("DedicatedSystemMemory %I64u\n", desc.DedicatedSystemMemory); printf("SharedSystemMemory %I64u\n", desc.SharedSystemMemory); printf("isUMA %i\n", isUMA(pDevice)); SIZE_T budget = desc.DedicatedVideoMemory; if (isUMA(pDevice)) { budget += desc.DedicatedSystemMemory + desc.SharedSystemMemory; } IDXGIAdapter3* pAdapter3 = nullptr; DXGI_QUERY_VIDEO_MEMORY_INFO info = {}; if (SUCCEEDED(pAdapter->QueryInterface(__uuidof(IDXGIAdapter3), (void**)&pAdapter3)) && SUCCEEDED(pAdapter3->QueryVideoMemoryInfo(0, DXGI_MEMORY_SEGMENT_GROUP_LOCAL, &info))) { budget = info.Budget; } printf("budget %I64u\n", budget); } }}
Integrated graphics parts which share their video memory with the CPU require special considerations when detecting VRAM budgets.
Preferred method:
<code readonly="true" class="language-cpp"> <xmp>IDXGIAdapter3* pAdapter3 = nullptr;DXGI_QUERY_VIDEO_MEMORY_INFO info = {};if (SUCCEEDED(pAdapter->QueryInterface(__uuidof(IDXGIAdapter3), (void**)&pAdapter3)) && SUCCEEDED(pAdapter3->QueryVideoMemoryInfo(0, DXGI_MEMORY_SEGMENT_GROUP_LOCAL, &info))){ budget = info.Budget;}</xmp> </code>
Alternative method:
DXGI_ADAPTER_DESC desc;if (SUCCEEDED(pAdapter->GetDesc(&desc))){ SIZE_T budget = desc.DedicatedVideoMemory; if (isUMA(pDevice)) { budget += desc.DedicatedSystemMemory + desc.SharedSystemMemory; }}
DedicatedVideoMemory
: This represents the actual local memory on discrete GPUs and the dedicated carve-out system memory on integrated GPUs.DedicatedSystemMemory
: This value is always zero on AMD GPUs.SharedSystemMemory
: This is determined by the GPU KMD and may return up to half of system memory.DedicatedVideoMemorySize
alone may be insufficient to run some gaming applications on systems with integrated graphics (UMA).SharedSystemMemorySize
then rely on the GPU KMD and the vidMm to assign system memory optimally.CheckFeatureSupport
to query UMA.//// Copyright (c) 2021 Advanced Micro Devices, Inc. All rights reserved.//// Permission is hereby granted, free of charge, to any person obtaining a copy// of this software and associated documentation files (the "Software"), to deal// in the Software without restriction, including without limitation the rights// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell// copies of the Software, and to permit persons to whom the Software is// furnished to do so, subject to the following conditions://// The above copyright notice and this permission notice shall be included in// all copies or substantial portions of the Software.//// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN// THE SOFTWARE.//
#include <iostream>#include <dxgi1_4.h>#include <d3d12.h>
#pragma comment( lib, "dxgi" )#pragma comment( lib, "d3d12" )
bool isUMA(ID3D12Device* pDevice){ bool result = false; D3D12_FEATURE_DATA_ARCHITECTURE data = {}; if (S_OK == pDevice->CheckFeatureSupport( D3D12_FEATURE_ARCHITECTURE, &data, sizeof(data))) { result = data.UMA; } return result;}
int main(){ ID3D12Device* pDevice = nullptr; if (SUCCEEDED(D3D12CreateDevice( NULL, D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), (void**)&pDevice))) { IDXGIFactory* pFactory; IDXGIFactory4* pFactory4; if (SUCCEEDED(CreateDXGIFactory(__uuidof(IDXGIFactory), (void**)(&pFactory))) && SUCCEEDED(pFactory->QueryInterface(__uuidof(IDXGIFactory4), (void**)&pFactory4))) { LUID luid = pDevice->GetAdapterLuid(); IDXGIAdapter* pAdapter; DXGI_ADAPTER_DESC desc; if (SUCCEEDED(pFactory4->EnumAdapterByLuid(luid, __uuidof(IDXGIAdapter), (void**)&pAdapter)) && SUCCEEDED(pAdapter->GetDesc(&desc))) { printf("DedicatedVideoMemory %I64u\n", desc.DedicatedVideoMemory); printf("DedicatedSystemMemory %I64u\n", desc.DedicatedSystemMemory); printf("SharedSystemMemory %I64u\n", desc.SharedSystemMemory); printf("isUMA %i\n", isUMA(pDevice)); SIZE_T budget = desc.DedicatedVideoMemory; if (isUMA(pDevice)) { budget += desc.DedicatedSystemMemory + desc.SharedSystemMemory; } IDXGIAdapter3* pAdapter3 = nullptr; DXGI_QUERY_VIDEO_MEMORY_INFO info = {}; if (SUCCEEDED(pAdapter->QueryInterface(__uuidof(IDXGIAdapter3), (void**)&pAdapter3)) && SUCCEEDED(pAdapter3->QueryVideoMemoryInfo(0, DXGI_MEMORY_SEGMENT_GROUP_LOCAL, &info))) { budget = info.Budget; } printf("budget %I64u\n", budget); } } }}
//// Copyright (c) 2021 Advanced Micro Devices, Inc. All rights reserved.//// Permission is hereby granted, free of charge, to any person obtaining a copy// of this software and associated documentation files (the "Software"), to deal// in the Software without restriction, including without limitation the rights// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell// copies of the Software, and to permit persons to whom the Software is// furnished to do so, subject to the following conditions://// The above copyright notice and this permission notice shall be included in// all copies or substantial portions of the Software.//// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN// THE SOFTWARE.//
#include <iostream>#include <dxgi1_4.h>#include <d3d11_3.h>
#pragma comment( lib, "dxgi" )#pragma comment( lib, "d3d11" )
bool isUMA(ID3D11Device* pDevice){ bool result = false; ID3D11Device3* pD3D11Device3 = nullptr; if (S_OK == pDevice->QueryInterface(IID_PPV_ARGS(&pD3D11Device3)) && pD3D11Device3) { D3D11_FEATURE_DATA_D3D11_OPTIONS2 data = {}; if (S_OK == pD3D11Device3->CheckFeatureSupport( D3D11_FEATURE_D3D11_OPTIONS2, &data, sizeof(data))) { result = data.UnifiedMemoryArchitecture; } pD3D11Device3->Release(); } return result;}
int main(){ UINT flags = NULL; // D3D11_CREATE_DEVICE_SINGLETHREADED; D3D_FEATURE_LEVEL featureLevels[] = { D3D_FEATURE_LEVEL_11_0 }; UINT numFeatureLevels = ARRAYSIZE(featureLevels); D3D_FEATURE_LEVEL featureLevel; ID3D11Device* pDevice = nullptr; ID3D11DeviceContext* pImmediateContext = nullptr; if SUCCEEDED(D3D11CreateDevice( NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, flags, featureLevels, numFeatureLevels, D3D11_SDK_VERSION, &pDevice, &featureLevel, &pImmediateContext)) { IDXGIDevice* pDXGIDevice = nullptr; IDXGIAdapter* pAdapter = nullptr; DXGI_ADAPTER_DESC desc; if (SUCCEEDED(pDevice->QueryInterface(__uuidof(IDXGIDevice), (void**)&pDXGIDevice)) && SUCCEEDED(pDXGIDevice->GetAdapter(&pAdapter)) && SUCCEEDED(pAdapter->GetDesc(&desc))) { printf("DedicatedVideoMemory %I64u\n", desc.DedicatedVideoMemory); printf("DedicatedSystemMemory %I64u\n", desc.DedicatedSystemMemory); printf("SharedSystemMemory %I64u\n", desc.SharedSystemMemory); printf("isUMA %i\n", isUMA(pDevice)); SIZE_T budget = desc.DedicatedVideoMemory; if (isUMA(pDevice)) { budget += desc.DedicatedSystemMemory + desc.SharedSystemMemory; } IDXGIAdapter3* pAdapter3 = nullptr; DXGI_QUERY_VIDEO_MEMORY_INFO info = {}; if (SUCCEEDED(pAdapter->QueryInterface(__uuidof(IDXGIAdapter3), (void**)&pAdapter3)) && SUCCEEDED(pAdapter3->QueryVideoMemoryInfo(0, DXGI_MEMORY_SEGMENT_GROUP_LOCAL, &info))) { budget = info.Budget; } printf("budget %I64u\n", budget); } }}
Sometimes feature scaling may required in order to achieve acceptable framerates on thermal limited platforms.
DXGI_FORMAT_R11G11B10_FLOAT
rather than DXGI_FORMAT_R16G16B16A16_FLOAT
.r.SceneColorFormat
r.AmbientOcclusionLevels
Additional considerations may be necessary to ensure the expected GPU is utilized in hybrid graphics platforms.
IDXGIFactory6::EnumAdapterByGpuPreference
.DXGI_GPU_PREFERENCE_HIGH_PERFORMANCE
for game applications.DXGI_GPU_PREFERENCE=2
( DXGI_GPU_PREFERENCE_HIGH_PERFORMANCE
).bp dxgi!CDXGIFactory::EnumAdapterByGpuPreference ".printf \"FOUND DXGIFactory::EnumAdapterByGpuPreference DXGI_GPU_PREFERENCE=%x\\n\",@r8"
memcpy
, memset
, and other c runtime optimizations.memcpy
source and destination to a 4096
byte page boundary may reduce Zen 2 store to load forwarding events (See STLIOther
in AMD µProf).4096
page boundary may benefit probe filtering on AMD Threadripper™ and EPYC™ processors.64
bytes) may reduce false sharing._aligned_malloc
or C++17 aligned new
.64
byte cache line or 4096
byte page.Modern sync APIs include std::mutex
, std::shared_mutex
, SRWLock
, and EnterCriticalSection
.
WaitForSingleObject
or user spin locks.mwaitx
instruction efficiently to wait on an address or timeout.Syscall
overhead.SetEventOnCompletion()
may be as efficient as the old fence polling model while avoiding starving other threads or unnecessarily consuming power.%NUMBER_OF_PROCESSORS%
This advice is specific to AMD processors and is not general guidance for all processor vendors.
Generally, applications show SMT benefits and use of all logical processors is recommended. However, games often suffer from SMT contention on the main or render threads during gameplay.