Skip to content

schola-rllib Command

This script trains an RLlib model using Schola, allowing customization of training, logging, network architecture, and resource allocation through command-line arguments.

Usage

Terminal window
usage: schola-rllib [-h] [--launch-unreal] [--executable-path EXECUTABLE_PATH] [--headless] [-p PORT] [--map MAP] [--fps FPS] [--disable-script] [-t TIMESTEPS]
[--learning-rate LEARNING_RATE] [--minibatch-size MINIBATCH_SIZE] [--train-batch-size-per-learner TRAIN_BATCH_SIZE_PER_LEARNER] [--num-sgd-iter NUM_SGD_ITER]
[--gamma GAMMA] [-scholav SCHOLA_VERBOSITY] [-rllibv RLLIB_VERBOSITY] [--enable-checkpoints] [--checkpoint-dir CHECKPOINT_DIR] [--save-freq SAVE_FREQ]
[--name-prefix NAME_PREFIX_OVERRIDE] [--export-onnx] [--save-final-policy] [--resume-from RESUME_FROM] [--fcnet-hiddens FCNET_HIDDENS [FCNET_HIDDENS ...]]
[--num-workers NUM_WORKERS] [--num-envs-per-worker NUM_ENVS_PER_WORKER] [--num-cpus-per-worker NUM_CPUS_PER_WORKER] [--num-gpus NUM_GPUS]
{PPO,APPO,IMPALA} ...

Unreal Process Arguments

  • --launch-unreal - Launch Unreal Engine automatically

  • Default: False

  • Required: False

  • --executable-path - Path to the Unreal Engine executable

  • Type: str

  • Required: False

  • --headless - Run Unreal Engine in headless mode

  • Default: False

  • Required: False

  • -p, --port - Port for Unreal Engine communication

  • Default: 15151

  • Type: int

  • Required: False

  • --map - Map to load in Unreal Engine

  • Type: str

  • Required: False

  • --fps - Target FPS for Unreal Engine

  • Default: 60

  • Type: int

  • Required: False

  • --disable-script - Disable script execution in Unreal Engine

  • Default: False

  • Required: False

Training Arguments

  • -t, --timesteps - Number of timesteps to train

  • Default: 1000000

  • Type: int

  • Required: False

  • --learning-rate - Learning rate for training

  • Default: 0.0003

  • Type: float

  • Required: False

  • --minibatch-size - Minibatch size for training

  • Default: 32

  • Type: int

  • Required: False

  • --train-batch-size-per-learner - Training batch size per learner

  • Default: 500

  • Type: int

  • Required: False

  • --num-sgd-iter - Number of SGD iterations

  • Default: 10

  • Type: int

  • Required: False

  • --gamma - Discount factor for future rewards

  • Default: 0.99

  • Type: float

  • Required: False

Logging Arguments

  • -scholav, --schola-verbosity - Verbosity level for the Schola environment

  • Default: 0

  • Type: int

  • Required: False

  • -rllibv, --rllib-verbosity - Verbosity level for RLlib

  • Default: 1

  • Type: int

  • Required: False

Checkpoint Arguments

  • --enable-checkpoints - Enable saving checkpoints

  • Default: False

  • Required: False

  • --checkpoint-dir - Directory to save checkpoints

  • Default: './ckpt'

  • Type: str

  • Required: False

  • --save-freq - Frequency with which to save checkpoints

  • Default: 100000

  • Type: int

  • Required: False

  • --name-prefix - Override the name prefix for the checkpoint files (e.g. SAC, PPO, etc.)

  • Type: str

  • Required: False

  • --export-onnx - Export the model to ONNX format instead of just saving a checkpoint

  • Default: False

  • Required: False

  • --save-final-policy - Save the final policy after training is complete

  • Default: False

  • Required: False

  • --resume-from - Path to a saved model to resume training from

  • Type: str

  • Required: False

Network Architecture Arguments

  • --fcnet-hiddens - Fully connected network hidden layer sizes
  • Type: int (multiple values allowed)
  • Required: False

Resource Arguments

  • --num-workers - Number of worker processes

  • Default: 2

  • Type: int

  • Required: False

  • --num-envs-per-worker - Number of environments per worker

  • Default: 1

  • Type: int

  • Required: False

  • --num-cpus-per-worker - Number of CPUs per worker

  • Default: 1

  • Type: int

  • Required: False

  • --num-gpus - Number of GPUs to use

  • Default: 0

  • Type: int

  • Required: False

Sub-commands

PPO

Proximal Policy Optimization algorithm.

Optional Arguments

  • Algorithm-specific arguments for PPO configuration

APPO

Asynchronous Proximal Policy Optimization algorithm.

Optional Arguments

  • Algorithm-specific arguments for APPO configuration

IMPALA

Importance Weighted Actor-Learner Architecture algorithm.

Optional Arguments

  • Algorithm-specific arguments for IMPALA configuration