schola-rllib Command
This script trains an RLlib model using Schola, allowing customization of training, logging, network architecture, and resource allocation through command-line arguments.
Usage
usage: schola-rllib [-h] [--launch-unreal] [--executable-path EXECUTABLE_PATH] [--headless] [-p PORT] [--map MAP] [--fps FPS] [--disable-script] [-t TIMESTEPS] [--learning-rate LEARNING_RATE] [--minibatch-size MINIBATCH_SIZE] [--train-batch-size-per-learner TRAIN_BATCH_SIZE_PER_LEARNER] [--num-sgd-iter NUM_SGD_ITER] [--gamma GAMMA] [-scholav SCHOLA_VERBOSITY] [-rllibv RLLIB_VERBOSITY] [--enable-checkpoints] [--checkpoint-dir CHECKPOINT_DIR] [--save-freq SAVE_FREQ] [--name-prefix NAME_PREFIX_OVERRIDE] [--export-onnx] [--save-final-policy] [--resume-from RESUME_FROM] [--fcnet-hiddens FCNET_HIDDENS [FCNET_HIDDENS ...]] [--num-workers NUM_WORKERS] [--num-envs-per-worker NUM_ENVS_PER_WORKER] [--num-cpus-per-worker NUM_CPUS_PER_WORKER] [--num-gpus NUM_GPUS] {PPO,APPO,IMPALA} ...
Unreal Process Arguments
-
--launch-unreal
- Launch Unreal Engine automatically -
Default:
False
-
Required: False
-
--executable-path
- Path to the Unreal Engine executable -
Type: str
-
Required: False
-
--headless
- Run Unreal Engine in headless mode -
Default:
False
-
Required: False
-
-p, --port
- Port for Unreal Engine communication -
Default:
15151
-
Type: int
-
Required: False
-
--map
- Map to load in Unreal Engine -
Type: str
-
Required: False
-
--fps
- Target FPS for Unreal Engine -
Default:
60
-
Type: int
-
Required: False
-
--disable-script
- Disable script execution in Unreal Engine -
Default:
False
-
Required: False
Training Arguments
-
-t, --timesteps
- Number of timesteps to train -
Default:
1000000
-
Type: int
-
Required: False
-
--learning-rate
- Learning rate for training -
Default:
0.0003
-
Type: float
-
Required: False
-
--minibatch-size
- Minibatch size for training -
Default:
32
-
Type: int
-
Required: False
-
--train-batch-size-per-learner
- Training batch size per learner -
Default:
500
-
Type: int
-
Required: False
-
--num-sgd-iter
- Number of SGD iterations -
Default:
10
-
Type: int
-
Required: False
-
--gamma
- Discount factor for future rewards -
Default:
0.99
-
Type: float
-
Required: False
Logging Arguments
-
-scholav, --schola-verbosity
- Verbosity level for the Schola environment -
Default:
0
-
Type: int
-
Required: False
-
-rllibv, --rllib-verbosity
- Verbosity level for RLlib -
Default:
1
-
Type: int
-
Required: False
Checkpoint Arguments
-
--enable-checkpoints
- Enable saving checkpoints -
Default:
False
-
Required: False
-
--checkpoint-dir
- Directory to save checkpoints -
Default:
'./ckpt'
-
Type: str
-
Required: False
-
--save-freq
- Frequency with which to save checkpoints -
Default:
100000
-
Type: int
-
Required: False
-
--name-prefix
- Override the name prefix for the checkpoint files (e.g. SAC, PPO, etc.) -
Type: str
-
Required: False
-
--export-onnx
- Export the model to ONNX format instead of just saving a checkpoint -
Default:
False
-
Required: False
-
--save-final-policy
- Save the final policy after training is complete -
Default:
False
-
Required: False
-
--resume-from
- Path to a saved model to resume training from -
Type: str
-
Required: False
Network Architecture Arguments
--fcnet-hiddens
- Fully connected network hidden layer sizes- Type: int (multiple values allowed)
- Required: False
Resource Arguments
-
--num-workers
- Number of worker processes -
Default:
2
-
Type: int
-
Required: False
-
--num-envs-per-worker
- Number of environments per worker -
Default:
1
-
Type: int
-
Required: False
-
--num-cpus-per-worker
- Number of CPUs per worker -
Default:
1
-
Type: int
-
Required: False
-
--num-gpus
- Number of GPUs to use -
Default:
0
-
Type: int
-
Required: False
Sub-commands
PPO
Proximal Policy Optimization algorithm.
Optional Arguments
- Algorithm-specific arguments for PPO configuration
APPO
Asynchronous Proximal Policy Optimization algorithm.
Optional Arguments
- Algorithm-specific arguments for APPO configuration
IMPALA
Importance Weighted Actor-Learner Architecture algorithm.
Optional Arguments
- Algorithm-specific arguments for IMPALA configuration