schola-rllib Command
This script trains an RLlib model using Schola, allowing customization of training, logging, network architecture, and resource allocation through command-line arguments.
Usage
usage: schola-rllib [-h] [--launch-unreal] [--executable-path EXECUTABLE_PATH] [--headless] [-p PORT] [--map MAP] [--fps FPS] [--disable-script] [-t TIMESTEPS] [--learning-rate LEARNING_RATE] [--minibatch-size MINIBATCH_SIZE] [--train-batch-size-per-learner TRAIN_BATCH_SIZE_PER_LEARNER] [--num-sgd-iter NUM_SGD_ITER] [--gamma GAMMA] [-scholav SCHOLA_VERBOSITY] [-rllibv RLLIB_VERBOSITY] [--enable-checkpoints] [--checkpoint-dir CHECKPOINT_DIR] [--save-freq SAVE_FREQ] [--name-prefix NAME_PREFIX_OVERRIDE] [--export-onnx] [--save-final-policy] [--resume-from RESUME_FROM] [--fcnet-hiddens FCNET_HIDDENS [FCNET_HIDDENS ...]] [--num-workers NUM_WORKERS] [--num-envs-per-worker NUM_ENVS_PER_WORKER] [--num-cpus-per-worker NUM_CPUS_PER_WORKER] [--num-gpus NUM_GPUS] {PPO,APPO,IMPALA} ...
Unreal Process Arguments
--launch-unreal
Launch Unreal Engine automatically
- Default:
False
- Required:
--executable-path
Path to the Unreal Engine executable
- Type: str
- Required: False
--headless
Run Unreal Engine in headless mode
- Default:
False
- Required: False
-p, --port
Port for Unreal Engine communication
- Default:
15151
- Type: int
- Required: False
--map
Map to load in Unreal Engine
- Type: str
- Required: False
--fps
Target FPS for Unreal Engine
- Default:
60
- Type: int
- Required: False
--disable-script
Disable script execution in Unreal Engine
- Default:
False
- Required: False
Training Arguments
-t, --timesteps
Number of timesteps to train
- Default:
1000000
- Type: int
- Required: False
--learning-rate
Learning rate for training
- Default:
0.0003
- Type: float
- Required: False
--minibatch-size
Minibatch size for training
- Default:
32
- Type: int
- Required: False
--train-batch-size-per-learner
Training batch size per learner
- Default:
500
- Type: int
- Required: False
--num-sgd-iter
Number of SGD iterations
- Default:
10
- Type: int
- Required: False
--gamma
Discount factor for future rewards
- Default:
0.99
- Type: float
- Required: False
Logging Arguments
-scholav, --schola-verbosity
Verbosity level for the Schola environment
- Default:
0
- Type: int
- Required: False
-rllibv, --rllib-verbosity
Verbosity level for RLlib
- Default:
1
- Type: int
- Required: False
Checkpoint Arguments
--enable-checkpoints
Enable saving checkpoints
- Default:
False
- Required: False
--checkpoint-dir
Directory to save checkpoints
- Default:
'./ckpt'
- Type: str
- Required: False
--save-freq
Frequency with which to save checkpoints
- Default:
100000
- Type: int
- Required: False
--name-prefix
Override the name prefix for the checkpoint files (e.g. SAC, PPO, etc.)
- Type: str
- Required: False
--export-onnx
Export the model to ONNX format instead of just saving a checkpoint
- Default:
False
- Required: False
--save-final-policy
Save the final policy after training is complete
- Default:
False
- Required: False
--resume-from
Path to a saved model to resume training from
- Type: str
- Required: False
Network Architecture Arguments
--fcnet-hiddens
Fully connected network hidden layer sizes
- Type: int (multiple values allowed)
- Required: False
Resource Arguments
--num-workers
Number of worker processes
- Default:
2
- Type: int
- Required: False
--num-envs-per-worker
Number of environments per worker
- Default:
1
- Type: int
- Required: False
--num-cpus-per-worker
Number of CPUs per worker
- Default:
1
- Type: int
- Required: False
--num-gpus
Number of GPUs to use
- Default:
0
- Type: int
- Required: False
Sub-commands
PPO
Proximal Policy Optimization
schola-rllib PPO [-h] [--disable-gae] [--gae-lambda GAE_LAMBDA] [--clip-param CLIP_PARAM]
Optional Arguments
--disable-gae
Disable the Generalized Advantage Estimation (GAE) for the PPO algorithm
- Default:
True
- Required: False
--gae-lambda
The GAE lambda value for the PPO algorithm
- Default:
0.95
- Type: float
- Required: False
--clip-param
The clip range for the PPO algorithm
- Default:
0.2
- Type: float
- Required: False
APPO
Asynchronous Proximal Policy Optimization algorithm.
schola-rllib APPO [-h] [--disable-vtrace] [--vtrace-clip-rho-threshold VTRACE_CLIP_RHO_THRESHOLD] [--vtrace-clip-pg-rho-threshold VTRACE_CLIP_PG_RHO_THRESHOLD] [--disable-gae] [--gae-lambda GAE_LAMBDA] [--clip-param CLIP_PARAM]
Optional Arguments
- Algorithm-specific arguments for APPO configuration
--disable-vtrace
Disable the V-trace algorithm
- Default:
True
- Required: False
--vtrace-clip-rho-threshold
The clip threshold for V-trace rho values
- Default:
1.0
- Type: float
- Required: False
--vtrace-clip-pg-rho-threshold
The clip threshold for V-trace rho values in the policy gradient
- Default:
1.0
- Type: float
- Required: False
--disable-gae
Disable the Generalized Advantage Estimation (GAE) for the PPO algorithm
- Default:
True
- Required: False
--gae-lambda
The GAE lambda value for the PPO algorithm
- Default:
0.95
- Type: float
- Required: False
--clip-param
The clip range for the PPO algorithm
- Default:
0.2
- Type: float
- Required: False
IMPALA
Importance Weighted Actor-Learner Architecture algorithm.
schola-rllib IMPALA [-h] [--disable-vtrace] [--vtrace-clip-rho-threshold VTRACE_CLIP_RHO_THRESHOLD] [--vtrace-clip-pg-rho-threshold VTRACE_CLIP_PG_RHO_THRESHOLD]
Optional Arguments
- Algorithm-specific arguments for IMPALA configuration
--disable-vtrace
Disable the V-trace algorithm
- Default:
True
- Required: False
--vtrace-clip-rho-threshold
The clip threshold for V-trace rho values
- Default:
1.0
- Type: float
- Required: False
--vtrace-clip-pg-rho-threshold
The clip threshold for V-trace rho values in the policy gradient
- Default:
1.0
- Type: float
- Required: False