Setting Up Inference

This guide will explain how to use your trained RL agents in inference mode (i.e. without connecting to Python).

This guide assumes you have already done a training run using Schola and have either a saved checkpoint or a model exported to Onnx.

Convert a Checkpoint to Onnx

If you did not export to Onnx during training you will need to convert a checkpoint to Onnx. You can use the following scripts to create an Onnx model from your checkpoint:

schola sb3 export —policy-checkpoint-path <CHECKPOINT_PATH> —output-path <ONNX_PATH> —algorithm <ALGORITHM>

For SB3, <ALGORITHM> must match how the checkpoint was trained (for example PPO or SAC); allowed values are the same as the export command’s --algorithm choices in schola sb3 export --help.

For RLlib, <CHECKPOINT_DIR> is the algorithm checkpoint directory produced by Ray. --output-path is optional; if omitted, ONNX output is written alongside the checkpoint (see schola rllib export --help).

These commands produce an ONNX model in Schola’s export layout for use in the next section.

Load an Onnx Model into Unreal Engine

Once you have your Onnx model you can import it into Unreal Engine by dragging and dropping the .onnx file into the content browser. This will create a new Onnx model data asset in your project.

Setting up Your Unreal Engine Level

Schola’s inference system consists of three main components:

Agent - Any object implementing the IAgent interface that defines observation and action spaces
Policy - A UNNEPolicy that loads your trained ONNX model and performs inference
Stepper - A USimpleStepper (or UPipelinedStepper) that coordinates the observation-inference-action loop

Follow these steps to set up inference in your project:

Step 1: Implement the IAgent Interface

Create a class (Actor, Component, or any UObject) that implements the IAgent interface. You must implement these methods:

Define() - Specify the observation and action spaces for your agent
Observe() - Collect current observations from the environment
Act() - Execute actions provided by the policy
GetStatus() / SetStatus() - Manage agent state

Create a Blueprint class and add the Agent interface. Implement the Define, Observe, and Act events.

UCLASS()
class AMyAgent : public AActor, public IAgent {
GENERATED_BODY()

virtual void
Define_Implementation(FInteractionDefinition &OutDefinition) override;
virtual void
Observe_Implementation(FInstancedStruct &OutObservations) override;
virtual void Act_Implementation(const FInstancedStruct &InAction) override;
};

Step 2: Create and Configure the Policy

Create a UNNEPolicy object and configure it with your ONNX model:

In your Blueprint or C++, create a UNNEPolicy object
Set the Model Data property to the ONNX model data asset you imported
Set the Runtime Name to your desired inference runtime (e.g., “NNERuntimeORTCpu” or “NNERuntimeORTDml”)
Call Init() with the agent’s interaction definition

Add a UNNEPolicy variable to your Blueprint
In BeginPlay, call Define on your agent to get the interaction definition
Call Init on the policy, passing the interaction definition
Set the Model Data and Runtime Name properties in the details panel

UNNEPolicy *Policy = NewObject<UNNEPolicy>(this);
Policy->ModelData = YourOnnxModelDataAsset;
Policy->RuntimeName = TEXT("NNERuntimeORTCpu");

FInteractionDefinition Definition;
IAgent::Execute_Define(YourAgent, Definition);
Policy->Init(Definition);

Step 3: Create and Initialize the Stepper

Create a USimpleStepper to manage the observation-inference-action loop:

Create a USimpleStepper object
Call Init() with your agent(s) and policy
Call Step() each frame (e.g., in Tick()) to run inference

Add a USimpleStepper variable to your Blueprint
In BeginPlay, call Init with an array of agents and your policy
In Tick, call Step on the stepper

USimpleStepper *Stepper = NewObject<USimpleStepper>(this);

TArray<TScriptInterface<IAgent>> Agents;
Agents.Add(YourAgent);

Stepper->Init(Agents, Policy);

// In your Tick function:
Stepper->Step();

For better performance with slower inference, consider using UPipelinedStepper instead of USimpleStepper. The pipelined stepper overlaps observation collection and action execution with inference.