schola-rllib Command
This script trains an RLlib model using Schola, allowing customization of training, logging, network architecture, and resource allocation through command-line arguments.
usage: schola-rllib [-h] [--launch-unreal] [--executable-path EXECUTABLE_PATH] [--headless] [-p PORT] [--map MAP] [--fps FPS] [--disable-script] [-t TIMESTEPS]
[--learning-rate LEARNING_RATE] [--minibatch-size MINIBATCH_SIZE] [--train-batch-size-per-learner TRAIN_BATCH_SIZE_PER_LEARNER]
[--num-sgd-iter NUM_SGD_ITER] [--gamma GAMMA] [-scholav SCHOLA_VERBOSITY] [-rllibv RLLIB_VERBOSITY] [--enable-checkpoints]
[--checkpoint-dir CHECKPOINT_DIR] [--save-freq SAVE_FREQ] [--name-prefix NAME_PREFIX_OVERRIDE] [--export-onnx] [--save-final-policy]
[--resume-from RESUME_FROM] [--fcnet-hiddens FCNET_HIDDENS [FCNET_HIDDENS ...]] [--activation ACTIVATION] [--use-attention]
[--attention-dim ATTENTION_DIM] [--num-gpus NUM_GPUS] [--num-cpus NUM_CPUS] [--num-cpus-per-learner NUM_CPUS_PER_LEARNER]
[--num-gpus-per-learner NUM_GPUS_PER_LEARNER] [--num-learners NUM_LEARNERS] [--num-cpus-for-main-process NUM_CPUS_FOR_MAIN_PROCESS]
[--using-cluster]
{PPO,APPO,IMPALA} ...
Unreal Process Arguments
- --launch-unreal
-
Flag indicating if the script should launch a standalone Unreal Engine process
Default:
False
Const: True
Required: False
- --executable-path
-
Path to the standalone executable, when launching a standalone Environment
Type: str
Required: False
- --headless
-
Flag indicating if the standalone Unreal Engine process should run in headless mode
Default:
False
Const: True
Required: False
- -p, --port
-
Port to connect to the Unreal Engine process, if None an open port will be automatically selected when running standalone. Port is required if connecting to an existing Unreal Engine process.
Type: int
Required: False
- --map
-
Map to load when launching a standalone Unreal Engine process
Type: str
Required: False
- --fps
-
Fixed FPS to use when running standalone, if None no fixed timestep is used
Type: int
Required: False
- --disable-script
-
Flag indicating if the autolaunch script setting in the Unreal Engine Schola Plugin should be disabled. Useful for testing.
Default:
False
Const: True
Required: False
Training Arguments
- -t, --timesteps
-
Number of timesteps to train for
Default:
3000
Type: int
Required: False
- --learning-rate
-
Learning rate for the PPO algorithm
Default:
0.0003
Type: float
Required: False
- --minibatch-size
-
The size of the minibatch for training. Taken from the train batch given to each learner
Default:
128
Type: int
Required: False
- --train-batch-size-per-learner
-
Size of the minibatch given to each learner
Default:
256
Type: int
Required: False
- --num-sgd-iter
-
The number of SGD iterations for each batch
Default:
5
Type: int
Required: False
- --gamma
-
The discount factor for the PPO algorithm
Default:
0.99
Type: float
Required: False
Logging Arguments
- -scholav, --schola-verbosity
-
Verbosity level for the Schola environment
Default:
0
Type: int
Required: False
- -rllibv, --rllib-verbosity
-
Verbosity level for RLlib
Default:
1
Type: int
Required: False
Checkpoint Arguments
- --enable-checkpoints
-
Enable saving checkpoints
Default:
False
Const: True
Required: False
- --checkpoint-dir
-
Directory to save checkpoints
Default:
'C:\Users\alexcann\source\repos\ScholaExamples\Plugins\Schola\Docs\Sphinx/ckpt'
Type: str
Required: False
- --save-freq
-
Frequency with which to save checkpoints
Default:
100000
Type: int
Required: False
- --name-prefix
-
Override the name prefix for the checkpoint files (e.g. SAC, PPO, etc.)
Type: str
Required: False
- --export-onnx
-
Whether to export the model to ONNX format instead of just saving a checkpoint
Default:
False
Const: True
Required: False
- --save-final-policy
-
Whether to save the final policy after training is complete
Default:
False
Const: True
Required: False
- --resume-from
-
Path to checkpoint to resume from
Type: str
Required: False
Network Architecture Arguments
- --fcnet-hiddens
-
Hidden layer architecture for the fully connected network
Default:
[512, 512]
Type: int
Required: False
- --activation
-
Activation function for the fully connected network
Default:
ActivationFunctionEnum.ReLU
Type: ActivationFunctionEnum
Required: False
- --use-attention
-
Whether to use attention in the model
Default:
False
Const: True
Required: False
- --attention-dim
-
The dimension of the attention layer
Default:
64
Type: int
Required: False
Resource Arguments
- --num-gpus
-
Number of GPUs to use
Default:
0
Type: int
Required: False
- --num-cpus
-
Number of CPUs to use
Default:
1
Type: int
Required: False
- --num-cpus-per-learner
-
Number of CPUs to use per learner process
Default:
1
Type: int
Required: False
- --num-gpus-per-learner
-
Number of GPUs to use per learner process
Default:
0
Type: int
Required: False
- --num-learners
-
Number of learner processes to use
Default:
0
Type: int
Required: False
- --num-cpus-for-main-process
-
Number of CPUs to use for the main process
Default:
1
Type: int
Required: False
- --using-cluster
-
Whether Ray is running on a cluster
Default:
False
Const: True
Required: False
Sub-commands
PPO
Proximal Policy Optimization
optional arguments
- --disable-gae
-
Disable the Generalized Advantage Estimation (GAE) for the PPO algorithm
Default:
True
Required: False
- --gae-lambda
-
The GAE lambda value for the PPO algorithm
Default:
0.95
Type: float
Required: False
- --clip-param
-
The clip range for the PPO algorithm
Default:
0.2
Type: float
Required: False
APPO
Asynchronous Proximal Policy Optimization
optional arguments
- --disable-vtrace
-
Disable the V-trace algorithm
Default:
True
Required: False
- --vtrace-clip-rho-threshold
-
The clip threshold for V-trace rho values
Default:
1.0
Type: float
Required: False
- --vtrace-clip-pg-rho-threshold
-
The clip threshold for V-trace rho values in the policy gradient
Default:
1.0
Type: float
Required: False
- --disable-gae
-
Disable the Generalized Advantage Estimation (GAE) for the PPO algorithm
Default:
True
Required: False
- --gae-lambda
-
The GAE lambda value for the PPO algorithm
Default:
0.95
Type: float
Required: False
- --clip-param
-
The clip range for the PPO algorithm
Default:
0.2
Type: float
Required: False
IMPALA
Importance Weighted Actor-Learner Architecture
optional arguments
- --disable-vtrace
-
Disable the V-trace algorithm
Default:
True
Required: False
- --vtrace-clip-rho-threshold
-
The clip threshold for V-trace rho values
Default:
1.0
Type: float
Required: False
- --vtrace-clip-pg-rho-threshold
-
The clip threshold for V-trace rho values in the policy gradient
Default:
1.0
Type: float
Required: False