Training an X-ARM 5 robotic arm with AMD Schola and Unreal Engine
Train a robot arm with reinforcement learning in AMD Schola using Unreal® Engine, progressively increasing task complexity to adapt to changing conditions.

AMD Schola is a library for developing reinforcement learning (RL) agents in Unreal® Engine, and training with your favorite python-based RL frameworks: Gym, RLlib and Stable Baselines 3.
AMD Schola v2 introduces a powerful and flexible architecture that decouples the inference process into components for maximum flexibility and reusability. This modular design allows you to mix and match different policies, stepping strategies, and agent implementations to suit your specific needs.
Key Components:
Agent Interface - Define an agent that takes actions and makes observations.
UInferenceComponent - Add inference to any actor.AInferencePawn - Standalone pawn-based agents.AInferenceController - AI controller pattern for complex behaviors.Policy Interface - Plug in different inference backends, to turn observations into actions.
UNNEPolicy - Native ONNX inference with Unreal Engine’s Neural Network Engine.UBlueprintPolicy - Custom Blueprint-based decision making.Stepper Objects - Control inference execution patterns by coordinating agents and policies.
SimpleStepper - Synchronous, straightforward inference.PipelinedStepper - Overlap inference with simulation for better throughput.This architecture means you can easily switch between inference backends, optimize performance characteristics, and compose behaviors without rewriting your agent logic. Whether you’re prototyping with Blueprints or deploying production-ready neural networks, the same agent interface works seamlessly with your chosen policy and execution strategy.
AMD Schola v2 introduces native support for the Minari dataset format, the standard for offline RL and imitation learning datasets. Minari provides a unified interface for storing and loading trajectory data, making it easier to share demonstrations and datasets across different projects and research communities.
One of the most powerful improvements in AMD Schola v2 is robust support for agents being spawned and deleted mid-episode. Previous versions required a static set of agents throughout an episode, or a predefined spawning function to spawn agents but v2 can now handle dynamic populations seamlessly.
This enables realistic scenarios like:
The system lets you manage lifecycles the way you want, simply mark the agents as terminated when they die, or start reporting observations when they spawn. This makes it much easier to build realistic, dynamic environments that mirror actual game scenarios.
Training from the command line is now more intuitive than ever:
# Stable Baselines 3schola sb3 train ppo ...
# Ray RLlibschola rllib train ppo ...
# Utilitiesschola compile-protoschola build-docsThe new CLI built with cyclopts provides better error messages, auto-completion support, and a more consistent interface across different RL frameworks.
Working in Unreal Engine Blueprints is smoother than ever:
AMD Schola v2 has been updated to support the latest versions of all major RL frameworks and libraries:
These updates ensure you can leverage the latest features, bug fixes, and performance improvements from the RL ecosystem while training your agents in Unreal Engine.
Schola provides tools for connecting and controlling agents with ONNX models inside Unreal Engine, allowing for inference with or without Python.
Schola exposes simple interfaces in Unreal Engine for the user to implement, allowing you to quickly build and develop reinforcement learning environments.
Environments in Schola are modular so you can quickly design new agents from existing components, such as sensors and actuators.
Train multiple agents to compete against each other at the same time using RLlib and multi-agent environments built with Schola.
Run multiple copies of your environment within the same Unreal Engine process to accelerate training.
Run training without rendering to significantly improve training throughput.
The Basic environment features an agent that can move in the X-dimension and receives a small reward for going five steps in one direction and a bigger reward for going in the opposite direction.
The MazeSolver environment features a static maze that the agent learns to solve as fast as possible. The agent observers the environment using raycasts, moves by teleporting in 2 dimensions and is given a reward for getting closer to the goal.
The 3DBall environment features an agent that is trying to balance a ball on-top of itself. The agent can rotate itself and receives a reward every step until the ball falls.
The BallShooter environment features a rotating turret that learns to aim and shoot at randomly moving targets. The agent can rotate in either direction, and detects the targets by using a cone shaped ray-cast.
The Pong environment features two agents playing a collaborative game of pong. The agents receive a reward every step as long as the ball has not hit the wall behind either agent. The game ends when the ball hits the wall behind either agent.
The Tag environment features a 3v1 game of tag, where one agent(the runner) has to run away from the other agents which are trying to collide with it. The agents move using forward, left and right movement input, and observe the environment with a combination of ray-casts and global position data.
The RaceTrack environment features a car implemented with Chaos Vehicles, that learns to follow a race track. The agent controls the throttle, break and steering of the car, and can see it’s velocity and position relative to the center of the track.


Unreal® is a trademark or registered trademark of Epic Games, Inc. in the United States of America and elsewhere.
“Python” is a trademark or registered trademark of the Python Software Foundation.
What's new in AMD Schola v2.0.0?
Flexible inference architecture with agent/policy/stepper system
AMD Schola v2 introduces a powerful and flexible architecture that decouples the inference process into components for maximum flexibility and reusability. This modular design allows you to mix and match different policies, stepping strategies, and agent implementations to suit your specific needs.
Key components:
Agent Interface - Define what can be controlled by an inference policy.
UInferenceComponent - Add inference to any actor.AInferencePawn - Standalone pawn-based agents.AInferenceController - AI controller pattern for complex behaviors.Policy Interface - Plug in different inference backends.
UNNEPolicy - Native ONNX inference with Unreal Engine's Neural Network Engine.UBlueprintPolicy - Custom Blueprint-based decision making.Stepper Objects - Control inference execution patterns.
SimpleStepper - Synchronous, straightforward inference.
PipelinedStepper - Overlap inference with simulation for better throughput.
Build custom steppers for specialized performance requirements.
This architecture means you can easily switch between inference backends, optimize performance characteristics, and compose behaviors without rewriting your agent logic. Whether you're prototyping with Blueprints or deploying production-ready neural networks, the same agent interface works seamlessly with your chosen policy and execution strategy.
Minari dataset support
AMD Schola v2 introduces native support for the Minari dataset format, the standard for offline RL and imitation learning datasets. Minari provides a unified interface for storing and loading trajectory data, making it easier to share demonstrations and datasets across different projects and research communities.
Dynamic agent management
One of the most powerful improvements in AMD Schola v2 is robust support for agents being spawned and deleted mid-episode. Previous versions required a static set of agents throughout an episode, or a predefined spawning function to spawn agents but v2 can now handle dynamic populations seamlessly.
This enables realistic scenarios like:
Battle Royale / Survival Games - Agents can be eliminated and removed from training without breaking the episode.
Population Simulations - Spawn new agents based on game events or environmental triggers.
Dynamic Team Composition - Add or remove team members on the fly.
Procedural Scenarios - Dynamically create agents as players progress through procedurally generated content.
The system lets you manage lifecycles the way you want, simply mark the agents as terminated when they die, or start reporting observations when they spawn. This makes it much easier to build realistic, dynamic environments that mirror actual game scenarios.
Enhanced command-line interface
Training from the command line is now more intuitive than ever:
# Stable Baselines 3
schola sb3 train ppo ...
schola sb3 export ...
# Ray RLlib
schola rllib train ppo ...
schola rllib export ...
# Utilities
schola compile-proto
schola build-docs
The new CLI built with cyclopts provides better error messages, auto-completion support, and a more consistent interface across different RL frameworks.
Unreal Blueprint improvements
Working in Unreal Engine Blueprints is smoother than ever:
Updated framework support
AMD Schola v2 has been updated to support the latest versions of all major RL frameworks and libraries:
Gymnasium - Full support for the latest Gymnasium API (1.1+).
Ray RLlib New API Stack - Compatible with the latest Ray RLlib features and algorithms.
Stable-Baselines3 2.x - Updated to work with the newest SB3 release.
These updates ensure you can leverage the latest features, bug fixes, and performance improvements from the RL ecosystem while training your agents in Unreal Engine.
Prerequisites