Skip to content

schola-rllib Command

This script trains an RLlib model using Schola, allowing customization of training, logging, network architecture, and resource allocation through command-line arguments.

Usage

Terminal window
usage: schola-rllib [-h] [--launch-unreal] [--executable-path EXECUTABLE_PATH] [--headless] [-p PORT] [--map MAP] [--fps FPS] [--disable-script] [-t TIMESTEPS]
[--learning-rate LEARNING_RATE] [--minibatch-size MINIBATCH_SIZE] [--train-batch-size-per-learner TRAIN_BATCH_SIZE_PER_LEARNER] [--num-sgd-iter NUM_SGD_ITER]
[--gamma GAMMA] [-scholav SCHOLA_VERBOSITY] [-rllibv RLLIB_VERBOSITY] [--enable-checkpoints] [--checkpoint-dir CHECKPOINT_DIR] [--save-freq SAVE_FREQ]
[--name-prefix NAME_PREFIX_OVERRIDE] [--export-onnx] [--save-final-policy] [--resume-from RESUME_FROM] [--fcnet-hiddens FCNET_HIDDENS [FCNET_HIDDENS ...]]
[--num-workers NUM_WORKERS] [--num-envs-per-worker NUM_ENVS_PER_WORKER] [--num-cpus-per-worker NUM_CPUS_PER_WORKER] [--num-gpus NUM_GPUS]
{PPO,APPO,IMPALA} ...

Unreal Process Arguments

--launch-unreal

Launch Unreal Engine automatically

  • Default: False
  • Required:

--executable-path

Path to the Unreal Engine executable

  • Type: str
  • Required: False

--headless

Run Unreal Engine in headless mode

  • Default: False
  • Required: False

-p, --port

Port for Unreal Engine communication

  • Default: 15151
  • Type: int
  • Required: False

--map

Map to load in Unreal Engine

  • Type: str
  • Required: False

--fps

Target FPS for Unreal Engine

  • Default: 60
  • Type: int
  • Required: False

--disable-script

Disable script execution in Unreal Engine

  • Default: False
  • Required: False

Training Arguments

-t, --timesteps

Number of timesteps to train

  • Default: 1000000
  • Type: int
  • Required: False

--learning-rate

Learning rate for training

  • Default: 0.0003
  • Type: float
  • Required: False

--minibatch-size

Minibatch size for training

  • Default: 32
  • Type: int
  • Required: False

--train-batch-size-per-learner

Training batch size per learner

  • Default: 500
  • Type: int
  • Required: False

--num-sgd-iter

Number of SGD iterations

  • Default: 10
  • Type: int
  • Required: False

--gamma

Discount factor for future rewards

  • Default: 0.99
  • Type: float
  • Required: False

Logging Arguments

-scholav, --schola-verbosity

Verbosity level for the Schola environment

  • Default: 0
  • Type: int
  • Required: False

-rllibv, --rllib-verbosity

Verbosity level for RLlib

  • Default: 1
  • Type: int
  • Required: False

Checkpoint Arguments

--enable-checkpoints

Enable saving checkpoints

  • Default: False
  • Required: False

--checkpoint-dir

Directory to save checkpoints

  • Default: './ckpt'
  • Type: str
  • Required: False

--save-freq

Frequency with which to save checkpoints

  • Default: 100000
  • Type: int
  • Required: False

--name-prefix

Override the name prefix for the checkpoint files (e.g. SAC, PPO, etc.)

  • Type: str
  • Required: False

--export-onnx

Export the model to ONNX format instead of just saving a checkpoint

  • Default: False
  • Required: False

--save-final-policy

Save the final policy after training is complete

  • Default: False
  • Required: False

--resume-from

Path to a saved model to resume training from

  • Type: str
  • Required: False

Network Architecture Arguments

--fcnet-hiddens

Fully connected network hidden layer sizes

  • Type: int (multiple values allowed)
  • Required: False

Resource Arguments

--num-workers

Number of worker processes

  • Default: 2
  • Type: int
  • Required: False

--num-envs-per-worker

Number of environments per worker

  • Default: 1
  • Type: int
  • Required: False

--num-cpus-per-worker

Number of CPUs per worker

  • Default: 1
  • Type: int
  • Required: False

--num-gpus

Number of GPUs to use

  • Default: 0
  • Type: int
  • Required: False

Sub-commands

PPO

Proximal Policy Optimization

Terminal window
schola-rllib PPO [-h] [--disable-gae] [--gae-lambda GAE_LAMBDA] [--clip-param CLIP_PARAM]

Optional Arguments

--disable-gae

Disable the Generalized Advantage Estimation (GAE) for the PPO algorithm

  • Default: True
  • Required: False

--gae-lambda

The GAE lambda value for the PPO algorithm

  • Default: 0.95
  • Type: float
  • Required: False

--clip-param

The clip range for the PPO algorithm

  • Default: 0.2
  • Type: float
  • Required: False

APPO

Asynchronous Proximal Policy Optimization algorithm.

Terminal window
schola-rllib APPO [-h] [--disable-vtrace] [--vtrace-clip-rho-threshold VTRACE_CLIP_RHO_THRESHOLD] [--vtrace-clip-pg-rho-threshold VTRACE_CLIP_PG_RHO_THRESHOLD] [--disable-gae]
[--gae-lambda GAE_LAMBDA] [--clip-param CLIP_PARAM]

Optional Arguments

  • Algorithm-specific arguments for APPO configuration

--disable-vtrace

Disable the V-trace algorithm

  • Default: True
  • Required: False

--vtrace-clip-rho-threshold

The clip threshold for V-trace rho values

  • Default: 1.0
  • Type: float
  • Required: False

--vtrace-clip-pg-rho-threshold

The clip threshold for V-trace rho values in the policy gradient

  • Default: 1.0
  • Type: float
  • Required: False

--disable-gae

Disable the Generalized Advantage Estimation (GAE) for the PPO algorithm

  • Default: True
  • Required: False

--gae-lambda

The GAE lambda value for the PPO algorithm

  • Default: 0.95
  • Type: float
  • Required: False

--clip-param

The clip range for the PPO algorithm

  • Default: 0.2
  • Type: float
  • Required: False

IMPALA

Importance Weighted Actor-Learner Architecture algorithm.

Terminal window
schola-rllib IMPALA [-h] [--disable-vtrace] [--vtrace-clip-rho-threshold VTRACE_CLIP_RHO_THRESHOLD] [--vtrace-clip-pg-rho-threshold VTRACE_CLIP_PG_RHO_THRESHOLD]

Optional Arguments

  • Algorithm-specific arguments for IMPALA configuration

--disable-vtrace

Disable the V-trace algorithm

  • Default: True
  • Required: False

--vtrace-clip-rho-threshold

The clip threshold for V-trace rho values

  • Default: 1.0
  • Type: float
  • Required: False

--vtrace-clip-pg-rho-threshold

The clip threshold for V-trace rho values in the policy gradient

  • Default: 1.0
  • Type: float
  • Required: False