Skip to content

Training NPCs to Play a MultiAgent Game of Tag

Training NPCs to Play a MultiAgent Game of Tag

In this tutorial, we will create a multi-agent environment where the agents are trained to play a 3v1 game of tag. Specifically, we create one runner agent which tries to avoid being caught and three tagger agents with the goal of catching the runner. The agents can move forward, left and right and can sense both their surrounding objects, as well as the locations of other agents.

The Structure of the Environment in Unreal Engine

To build the game (called environment hereafter), we need to create the following in our Unreal Engine project:

Initial Setup

Please refer to the guide on Getting Started with Schola to set up the Unreal Engine project and Schola plugin.

Creating the Custom Direction and Distance Observer

There are a variety of built-in observer classes available in Schola, such as the RotationObserver and RayCastSensor. Custom observers are needed when we need specific observations not covered by the built-in observers. In this example, we will create a custom observer (inheriting from UActorComponent) that implements methods defined in the IScholaSensor interface. The function is to allow taggers to observe the direction and distance of other agents relative to the current agent in the game. It will return the distance normalized by the environment size and the direction as a unit vector. The GetObservationSpace() function will return the observation space, and the CollectObservations() function will collect and return the observations.

  1. Create a new Blueprint Class with parent class ActorComponent, and name it DirectionDistanceObserver.
  2. Under Class SettingsInterfaces add ScholaSensor. Three function should appear under Interfaces.
  3. Add a new integer variable. Name it EnvSize, and set the default value to 5000. This stores the maximum possible distance between two agents within the environment.
  4. Add a new Actor variable. Name it Target. This stores the target agent that the observer will track.
  5. Set the GetObservationSpace() and CollectObservations() blueprints as shown below.

Creating the Agents

Creating the Tagger Class

  1. Create a new Blueprint Class with parent class Character, and name it Tagger.
  2. Add any desired static meshes and material as the agent’s body.
  3. Set DetailsCharacter Movement: WalkingMax Walk Speed to 520 cm/s.
  4. Set DetailsCharacter Movement (Rotation Settings)Orient Rotation to Movement to true. This allows the agent to rotate using the Movement Input Actuator.
  5. Set DetailsPawnUse Controller Rotation Yaw to false. This allows the agent to rotate using the Movement Input Actuator.
  6. In DetailsTags, add a new tag, and set the value to Tagger. This tag is used by the RayCastSensor to detect different objects.
  7. Add a new pawn variable Target (Object Reference) and make it publicly editable (by clicking on the eye icon to toggle the visibility)
  8. Add a new boolean variable Hit Wall to store whether the tagger agent has hit a wall in the current step
  9. Add a new boolean variable Caught Target which tracks whether the tagger agent has caught the runner agent in the current step.

Attaching the Ray Cast Observer
  1. Add a RayCastSensor component.
  2. Set DetailsSensor propertiesNumRays to 36.
  3. Set DetailsSensor propertiesRayDegrees to 360.
  4. Set DetailsSensor propertiesRayLength to 2048.
  5. In DetailsSensor propertiesTrackedTags, add two new elements and set the tags to Runner and Tagger.

.. note::For more information on attaching actuators and observers, please refer to the :ref:Attaching Actuators and Observers Section of Example 2 <attaching-actuators-and-observers>.Commented out because the reference material is outdated (as of 2025-11-12)

Attaching the Movement Input Actuator

We will use two Movement Input Actuators to move the agent. One lateral axis actuator to steer, and one forward axis actuator to move the agent forward.

  1. Add an Movement Input Actuator component, and name it ForwardAxis
  2. In DetailsActuator Settings, uncheck HasYDimension and HasZDimension.
  3. Add an Movement Input Actuator component, and name it LateralAxis
  4. In DetailsActuator Settings, uncheck HasXDimension and HasZDimension.
  5. In DetailsActuator Settings, set Minspeed to -1.
Attaching the Direction and Distance Observer
  1. Add three Sensor components, and name them Teammate Sensor 1, Teammate Sensor 2, and Runner Sensor.
  2. The Target variable of each sensor will be set when initializing the environment

Creating the Runner Class

The runner is constructed similarly to the tagger but with some minor changes. Please repeat the steps in the Creating the Tagger Class section with the following changes:

  1. Add the same RayCastSensor and MovementInputActuator to the runner class, but not the DirectionDistanceObserver.
  2. Set DetailsCharacter Movement: WalkingMax Walk Speed to 490 cm/s. We will make the runner slower initially to make it easier for the tagger to catch the runner, so the tagger can learn to catch the runner at the beginning of the training. If the runner is as fast or faster than the tagger, the taggers may never catch the runner, preventing the taggers from learning. This can be manually increased during training as the tagger improves and can consistently catch the slower runner.
  3. In DetailsTags, add a new element, and set the value to Runner. This tag is used by the RayCastSensor to detect different objects.

Creating the Environment Definition

We will create a SetRunnerTagged function in the environment which notifies all the trainers when the runner is caught. The InitializeEnvironment() binds a OnActorHit Event to each runner, that calls the SetRunnerTagged function when a runner comes into contact with a tagger. The Reset() function moves each agent to a random starting location, resets variables, and provides the Initial Agent State at the end of each episode. The Step() applies actions, collects observations, computes statuses, rewards, and packages it into the Agent State Structure. Additionally, two optional functions; SetEnvironmentOptions() and SeedEnvironment() are provided for logging and reproducability.

  1. Create a new Blueprint Class with parent class GymConnectorManager, and name it TagEnvironment.
  2. Go to Class SettingsImplemented Interfaces and add MultiAgentScholaEnvironment. The five functions should appear in Interfaces.
  3. Add a new variable named Agents of type Pawn (Object Reference) array, and make it publicly editable (by clicking on the eye icon to toggle the visibility). This keeps track of registered agents belonging to this environment definition.
  4. Add an integer variable called CurrentStep, the episode step counter.
  5. Add another integer variable named MaxSteps. Make it publicly editable, and set the default value to 2000. This stores the maximum number of steps an episode can run before ending. This may be set to a higher value if the tagger is unable to catch the runner within 2000 steps.
  6. Create the SetRunnerTagged function as shown below.
  7. Set the Event Graph as shown below.

Implementing the InitializeEnvironment Function

We will now implement the function to define the observation and action spaces for each agent. To keep the blueprint relatively managable, first create two functions titled GetTaggerSpace and GetRunnerSpace. Then fill them in as shown below.

Now fill in the InitializeEnvironment() function:

Implementing the Reset Function

We are going to again create some helper functions to keep the reset blueprint itself smaller. Create three functions titled GetTaggerInitialObs, GetRunnerInitialObs and CollectInitialObservations

Fill them in according to the following blueprints.

Now complete the Reset() function:

Implementing the Step Function

For the last of the interface functions, we have the Step(). Again, we are going to define several helper functions.

Define the Tagger Reward Function

We give a large one-time reward when the tagger agent catches the runner agent, and a small penalty of -0.02 when the tagger agent hits a wall. Additionally, we give a small penalty of -0.005 for each step the tagger agent takes, to encourage the agent to catch the runner agent as quickly as possible. The one-time reward is computed as 10 - (0.005 * DistanceFromRunner), where 10 is the maximum reward for catching the runner, and -0.005*DistanceFromRunner decreases the reward as the tagger gets further from the runner to ensure taggers near the runner are rewarded more when the runner is caught. The two numbers are chosen based on our experience and can be adjusted as needed. The per-step reward is computed as -(0.02*HitWall) - 0.005.

Create a new function called ComputeTaggerReward function as shown below.

Define the Tagger Status Function

For taggers, the terminal state is reached when the runner is caught.

Create and set the function ComputeTaggerStatus as show below.

Tagger Actions and Observations

To apply the incoming actions to the tagger and collect its observations, create and fill in the function GetTaggerStepPoints as shown below.

Define the Runner Reward Function

We give a large one-time penalty of -20 when the runner agent is caught and a small constant per-step reward of 0.01 to encourage the runner to survive as long as possible.

Create the function ComputeRunnerReward as follows.

Define the Runner Status Function

The runner has the same status function as the tagger, except we cast to a Runner object in order to correctly read the Caught Target variable.

Create ComputeRunnerStatus as shown below.

Runner Actions and Observations

As we did with the tagger, create and fill in the finction GetRunnerStepPoints.

Now that all of the supplementary functions have been created, fill in the Step function as follows.

Creating the Map

  1. Create a level with a floor and four walls.
  2. Add obstacles and decorations as desired.
  3. Place a TagEnvironment anywhere in the map. The location does not matter.
  4. Place three Taggers near the centre of the map.
  5. Place a Runner near the taggers.

Registering the Agents

  1. Select the TagEnvironment in the map.

  2. Go to Details panel → DefaultAgents.

  3. Add 4 new elements, and set the value to the four agents in the map.

  4. Select a tagger in the map.

  5. Go to Details Panel.

  6. Select the Teammate Sensor 1 component, set the Target to one of the other taggers, and repeat this for Teammate Sensor 2.

  7. Select the Runner Sensor component, and set the Target to the runner.

  8. Repeat this for the other two taggers.

Starting Training

We will train the agent using the Proximal Policy Optimization (PPO) algorithm for 2,000,000 steps. Since SB3 does not support multi-agent training we will use RLlib for this example. The following two methods run the same training. Running from the terminal may be more convenient for hyperparameter tuning, while running from the Unreal Editor may be more convenient when editing the game.

  1. Run the game in Unreal Engine (by clicking the green triangle).

  2. Open a terminal or command prompt, and run the following Python script:

    Terminal window
    schola rllib train ppo --protocol.port 8000 --training-settings.timesteps 2000000 --network-architecture-settings.use-attention
  3. Gradually increase the runner’s speed in the Runner Blueprint → Character Movement: WalkingMax Walk Speed as the taggers improve and can consistently catch the slower runner.