Skip to content

Building Ball Shooter

This guide walks you through creating a simple shooting range environment and training an agent to shoot moving targets in Unreal Engine. The goal is to aim correctly before shooting and take down targets one by one.

In this example, we will create a dynamic shooting range environment with a single agent that learns to shoot moving targets using reinforcement learning. The agent interacts with the environment by observing the target’s movements and performing actions through actuators to rotate and shoot.

We will train our agent by having it repeatedly attempt to hit the moving targets. Each attempt is referred to as an episode and ends when the agent successfully hits the target three times, or runs out of time.

Periodically, the agent will review its performance during previous episodes and update its policy to improve further. To quantify the agent’s performance, we define a reward function: hitting the target earns a reward, missing incurs a penalty, and each step taken without hitting the target incurs a small penalty. The agent can then use the learned policy to decide which actions to take during gameplay.

The Structure of the Environment in Unreal Engine

To build the game (called environment hereafter) where the agent will learn to shoot moving targets, we need the following in our Unreal Engine project:

  • Map: The game map includes the floor, four walls, the agent, and the environment definition.

  • Ball Blueprint: The projectile that the agent shoots, which is spawned by the agent when it takes the shoot action.

  • Target Blueprint: The object that the agent will shoot, which moves randomly around the map and is destroyed when hit three times by a ball.

  • Agent blueprint: A subclass of Pawn, which includes the shape and appearance of the agent.

  • Shooting Actuator: A custom discrete Actuator that allows the agent to shoot the ball.

  • Discrete Rotation Actuator: A custom discrete Actuator that allows the agent to rotate.

  • Trainer blueprint: A subclass of BlueprintTrainer, which includes the logic to compute the reward and status of the training, as well as Sensors Actuators.

  • Environment definition: A subclass of BlueprintStaticScholaEnvironment, which includes the logic of initializing and resetting the environment between different episodes of training.

  • Registering the agent: Connect the agent to the environment definition and trainer.

Initial Setup

  1. Create a new blank project with a desired name and location.
  2. Install the Schola plugin to the project using the /guides/setup_schola guide.
  3. Go to Edit → Project Settings, and scroll down to find Schola. If you don’t see Schola in the Project Settings, please check whether Schola is installed in Edit → Plugins Menu. Please refer to the /guides/setup_schola guide for more information.

  1. For Gym Connector Class, select Python Gym Connector

Creating the Map

  1. Create a shooting range with a floor and four walls in the map.
  2. For the walls, in DetailsTags, add a new element, and set the value to wall. This tag is used by the RayCastObserver to detect different objects.

Creating the Ball

The Ball class is the projectile that the agent shoots. The ball is spawned by the agent when it takes the shooting action and is destroyed upon hitting a wall or target.

  1. Create a new Blueprint Class with parent class Actor, and name it BallShooterBall.

  2. Add a Sphere Static Mesh Component to the blueprint, and optionally select a good-looking material.

    1. Enable DetailsPhysicsSimulate Physics.
    2. Enable DetailsCollisionSimulation Generates Hit Events.
    3. Enable DetailsCollisionGenerate Overlap Events.
    4. Set DetailsCollisionCollision Presets to Custom.
    5. Set DetailsCollisionCollision PresetsCollision Enabled to Probe Only. This prevents the ball from blocking the agent’s Ray Cast Observer vision.
  3. Add a Sphere Collision Component, making it slightly larger than the Sphere.

  4. Scale the DefaultSceneRoot to 0.5x0.5x0.5.

Creating the Target

The target is the object that the agent will shoot. The target moves randomly around the map and is destroyed when hit three times by a ball. The Event Tick will apply a random force to the target to move it around the map. The OnTakeAnyDamage_Event will be triggered when hit by a ball, adjust the target’s hitpoint, and destroy the target when the hitpoint reaches zero.

  1. Create a new Blueprint Class with parent class Actor, and name it BallShooterTarget.

  2. Add a Sphere Static Mesh Component to the blueprint, and optionally select a good-looking material.

    1. Enable DetailsPhysicsSimulate Physics.
    2. Enable DetailsCollisionSimulation Generates Hit Events.
    3. Enable DetailsCollisionGenerate Overlap Events.
    4. Set DetailsCollisionCollision Presets to PhysicsActor.
  3. Add a Sphere Collision Component, making it slightly larger than the Sphere.

  4. Scale the DefaultSceneRoot to 3x3x3.

  5. Add a new boolean variable. Name it isHit. It stores whether the agent is hit by a ball in the current step.

  6. Add a new Transform variable. Name it initialTransform. It stores the initial transform of the target when the episode starts.

  7. Add a new integer variable. Name it hitPoint, and set the default value to 3. It stores the number of times the target is hit by a ball. The target will be destroyed when the hitpoint reaches zero.

  8. Add a new float variable. Name it forceMagnitude, and set the default value to 50. It stores the magnitude of the random force applied to the target on each tick.

  9. Create a new function teleportToRandomLocation as shown below, and set the Make Vector node’s random ranges to the range of the shooting range. This function will teleport the target to a random location within the shooting range.

  10. Set the Event Graph as shown below.

    1. The Event Begin Play will save the initial transform of the target and bind the OnTakeAnyDamage_Event once.
    2. The OnTakeAnyDamage_Event will be triggered when hit by a ball, adjust the Target’s hitpoint, and destroy the target when the hitpoint reaches zero.
    3. The Event Tick will apply a random force to the target to move it around the shooting range.
  11. In DetailsTags, add a new element, and set the value to target. This tag is used by the RayCastObserver to detect different objects.

Creating the Agent

  1. Create a new Blueprint Class with parent class Pawn, and name it BallShooterAgent.
  2. Add any desired static meshes as the agent’s body, and optionally select good-looking materials.
  3. Add an Arrow Component, and set it as the agent’s forward direction. Name it Projectile Indicator.
  4. Save and close the blueprint, and place a BallShooterAgent at the center of the map.

Creating the Ball Shooter Shooting Actuator

There are a variety of built-in actuator classes available in Schola, such as the TeleportActuator and MovementInputActuator. However, some games may require custom actuators. In this example, we will create a custom BlueprintDiscreteActuator (subclass of DiscreteActuator) to shoot the ball. This actuator has two possible actions: shoot a ball or do nothing. The GetActionSpace function will return the action space, and the TakeAction function will take the action. We will also create two helper functions, getInitialVelocity() and getInitialLocation(), to get the initial velocity and location for spawning the ball.

A BinaryActuator can also be used here instead of the DiscreteActuator.

  1. Create a new Blueprint Class with parent class BlueprintDiscreteActuator, and name it BallShooterShootingActuator.
  2. Add a new float variable. Name it ballSpawnSpeed, and set the default value to 2000. This stores the speed of the ball when shot.
  3. Add a new Rotator variable. Name it projectileSpawnDirection. This stores the direction in which the ball will be spawned. Adjust the values to ensure the ball is spawned in the correct direction.
  4. Add a new float variable. Name it ballSpawnOffset. This stores the offset from the agent’s location where the ball will be spawned. Set the default value to 200, and adjust if necessary to ensure the ball is spawned in front of, not inside the agent.
  5. Add a new integer variable. Name it countOfBallsShot. It stores the number of balls shot by the agent in the current time step.
  6. Add a new Actor variable. Name it Agent. This stores the agent that owns the actuator.
  7. Convert the function TakeAction into an event. This allows us to bind the Ball Hit Event to the spawned ball.
  8. Set the getInitialVelocity(), getInitialLocation(), GetActionSpace, and TakeAction blueprints as shown below.

Creating the Ball Shooter Discrete Rotation Actuator

Although the RotationActuator exists in Schola and can be used to rotate agents continuously, we will create another custom BlueprintDiscreteActuator (subclass of DiscreteActuator) to rotate the agent. This actuator has three possible actions: rotate left, rotate right, or do nothing.

Mixing discrete and continuous actuators in the same agent should be avoided. The stable-baseline3 library and most algorithms in general do not support mixing discrete and continuous action spaces. Although some workarounds may exist, mixing may cause bugs or reduce training performance. Conversely, mixing discrete and continuous observers is completely supported.

  1. Create a new Blueprint Class with parent class BlueprintDiscreteActuator, and name it BallShooterDiscreteRotationActuator.
  2. Add a new float variable. Name it rotationMagnitude, and set the default value to 2. This stores the magnitude of the rotation when the agent rotates.
  3. Set the GetActionSpace and TakeAction blueprints as shown below.

Creating the Trainer

To train an agent in Schola, the agent must be controlled by an AbstractTrainer, which defines the ComputeReward and ComputeStatus functions. In this tutorial, we will be creating an BlueprintTrainer (subclass of AbstractTrainer).

  1. Create a new Blueprint Class with parent class BlueprintTrainer, and name it BallShooterTrainer.
  2. Add a new integer variable. Name it maxNumberOfHitsPerEpisode. It stores the maximum number of times the agent can hit the target in one episode, which is the number of targets multiplied by the number of hitpoints for each target. It is set by the Environment Definition blueprint.
  3. Add a new integer variable. Name it numOfHitsThisEpisode. It stores the number of times the agent has hit the target in the current episode. It is used to determine when the episode ends.
  4. Add a new integer variable. Name it numOfTargetHits. It stores the number of times the agent has hit the target in the current step.
  5. Add an Actuator component, and set the DetailsActuator ComponentActuator to BallShooterShootingActuator
  6. Set the Event Graph as shown below. This binds the On Ball Hit event to any balls spawned by the agent’s actuator, allowing the trainer to detect when the agent hits or misses the target.

Attaching Actuators and Observers

Unlike the Example 1, actuators and observers will not be attached to the agent blueprint. Instead, they will be attached in the Trainer blueprint. This approach simplifies passing variables, as the Trainer's ComputeReward and ComputeStatus logic rely on variables from the BallShooterDiscreteRotationActuator.

Actuator objects can be attached in three ways:

  1. Attaching an ActuatorComponent to the agent, which can contain an Actuator object.
  2. Attaching an ActuatorComponent component to the BlueprintTrainer, which can contain an Actuator object.
  3. Adding directly in the Actuators arrays in the BlueprintTrainer.

Attaching the Ball Shooter Shooting Actuator

  1. Add an Actuator component.
  2. In DetailsActuator ComponentActuator, select BallShooterDiscreteRotationActuator.

Attaching the Ball Shooter Discrete Rotation Actuator

  1. Add an Actuator component.
  2. In DetailsActuator ComponentActuator, select BallShooterDiscreteRotationActuator.

Attaching the Ray Cast Observer

  1. Add a Sensor component.
  2. In DetailsSensorObserver, select Ray Cast Observer.
  3. In DetailsSensorObserver``→ ``Sensor propertiesNumRays, enter 10.
  4. In DetailsSensorObserver``→ ``Sensor propertiesRayDegrees, enter 120.
  5. In DetailsSensorObserver``→ ``Sensor properties, check the DrawDebugLines box.
  6. In DetailsSensorObserver``→ ``Sensor propertiesTrackedTags, add a new element, and set the tag to target.

Define the Reward Function

In this tutorial, we give a reward of 1 for hitting a target and a penalty of -0.01 for missing the target. Additionally, we give a small penalty of -0.05 for each step the agent takes, to encourage the agent to destroy all targets and end the episode as quickly as possible. The per-step reward is computed as (1.01*numOfTargetHits - 0.01*countOfBallsShot) - 0.05

  1. Add a new float variable. Name it reward. It stores the reward for the current step.
  2. Set the ComputeReward function as shown below.

Define the Status Function

There are three possible statuses for each time step:

  1. Running: The episode is still ongoing, and the agent continues interacting with the environment.
  2. Completed: The agent has successfully reached a terminal state, completing the episode.
  3. Truncated: The episode has been forcefully ended, often due to external limits like time steps or manual intervention, without reaching the terminal state.

In this tutorial, the terminal state is reached when the agent destroys all targets, which is when the numOfTargetHits is equal to the maxNumberOfHitsPerEpisode. We also set a max step to prevent an episode from running indefinitely.

  1. Add a new integer variable. Name it maxStep, and set the default value to 1000. This means an episode is truncated if it reaches 1000 time steps without completing. You may adjust this number if you want to allow longer or shorter episodes due to factors such as the size of the environment or the speed of the agent.
  2. Set the ComputeStatus as shown below.

The Step variable is a part of the BlueprintTrainer and it tracks the current number of steps since the last ResetEnvironment call.

Creating the Environment Definition

To train an agent in Schola, the game must have an StaticScholaEnvironment Unreal object, which contains the agent and logic for initializing or resetting the game environment. In this tutorial, we will be creating an Blueprint Environment (subclass of StaticScholaEnvironment) as the Environment. The InitializeEnvironment function is called at the start of the game, and sets the initial state of the environment. In this tutorial, we save the initial transform (position and rotation) of the agent. The ResetEnvironment function is called before every new episode. In this tutorial, we reset the agent to its initial transform, clean up any leftover balls and targets, spawn three new targets, calculate the TotalHitPoints for the episode, and reset the variables in the trainer.

  1. Create a new Blueprint Class with parent class BlueprintStaticScholaEnvironment, and name it BallShooterEnvironment.

  2. Add a new variable named agentArray of type Pawn (Object Reference) array. This keeps track of registered agents belonging to this environment definition.

    1. Make this variable publicly editable (by clicking on the eye icon to toggle the visibility).
  3. Add a new Transform variable named agentInitialLocation. This is for storing the initial position and rotation of the agent, so it can be restored upon reset.

  4. Add a new integer variable named numberOfTargets, and set the default value to 3. This stores the number of targets to spawn in the environment.

  5. Add a new integer variable named totalHitPoints. This stores the total number of hit points for the episode, which is the number of targets multiplied by the number of hitpoints for each target.

  6. Add a new variable named Targets of type Ball Shooter Target (Object Reference) array. This stores the spawned targets in the environment.

  7. Create functions saveAgentInitialTransform and placeAgentToInitialTransform as shown below. This saves the initial transform of the agent and places the agent to its initial transform when the episode starts.

  8. Set the Event Graph and RegisterAgents function as shown below.

  9. Save and close the blueprint, and place a BallShooterEnvironment anywhere in the map. The location does not matter.

Registering the Agent

  1. Click on the BallShooterEnvironment in the map.

    1. Go to Details panel → DefaultAgent Array.
    2. Add a new element.
    3. Select BallShooterAgent in the drop-down menu.
  2. Open the BallShooterAgent class in the blueprint editor.

    1. Go to Details Panel.
    2. Search for AIController.
    3. In the drop-down, select BallShooterTrainer .

Starting Training

We will train the agent using the Proximal Policy Optimization (PPO) algorithm for 100,000 steps. The following two methods run the same training. Running from the terminal may be more convenient for hyperparameter tuning, while running from the Unreal Editor may be more convenient when editing the game.

  1. Run the game in Unreal Engine (by clicking the green triangle).
  2. Open a terminal or command prompt, and run the following Python script:
Terminal window
schola-sb3 -p 8000 -t 100000 PPO

To run with RLlib, use the schola-rllib command instead of schola-sb3.

Terminal window
schola-rllib -p 8000 -t 100000

Enabling TensorBoard

TensorBoard is a visualization tool provided by TensorFlow that allows you to track and visualize metrics such as loss and reward during training.

Add the --enable-tensorboard flag to the command to enable TensorBoard. The --log-dir flag sets the directory where the logs are saved.

Terminal window
schola-sb3 -p 8000 -t 100000 --enable-tensorboard --log-dir experiment_ball_shooter PPO

Running with RLlib using schola-rllib already enables TensorBoard by default.

After training, you can view the training progress in TensorBoard by running the following command in the terminal or command prompt. Make sure to first install TensorBoard, and set the --logdir to the directory where the logs are saved.

Terminal window
tensorboard --logdir experiment_ball_shooter/PPO_1

Logs for subsequent schola-sb3 runs will be in PPO_2, PPO_3, etc.

If you are running with RLlib, the logs will be saved in the ckpt/PPO_timestamp directory.