schola-sb3 Command
This script trains a Stable Baselines3 model using Schola, allowing customization of training, logging, network architecture, and resource allocation through command-line arguments.
usage: schola-sb3 [-h] [-t TIMESTEPS] [--launch-unreal] [--executable-path EXECUTABLE_PATH] [--headless] [-p PORT] [--map MAP] [--fps FPS] [--disable-script]
[--pbar] [--disable-eval] [--enable-tensorboard] [--log-dir LOG_DIR] [--log-freq LOG_FREQ] [--callback-verbosity CALLBACK_VERBOSITY]
[-scholav SCHOLA_VERBOSITY] [-sb3v SB3_VERBOSITY] [--enable-checkpoints] [--checkpoint-dir CHECKPOINT_DIR] [--save-freq SAVE_FREQ]
[--name-prefix NAME_PREFIX_OVERRIDE] [--export-onnx] [--save-final-policy] [--save-replay-buffer] [--save-vecnormalize]
[--resume-from RESUME_FROM] [--load-vecnormalize LOAD_VECNORMALIZE] [--load-replay-buffer LOAD_REPLAY_BUFFER] [--reset-timestep]
[--policy-parameters [POLICY_PARAMETERS ...]] [--critic-parameters [CRITIC_PARAMETERS ...]] [--activation ACTIVATION]
{PPO,SAC} ...
optional arguments
- -t, --timesteps
-
Default:
3000
Type: int
Required: False
- --pbar
-
Enable the progress bar. Requires tqdm and rich packages
Default:
False
Const: True
Required: False
- --disable-eval
-
Disable evaluation of the model after training. Useful for short runs that might otherwise hang with an untrained model.
Default:
False
Const: True
Required: False
Unreal Process Arguments
- --launch-unreal
-
Flag indicating if the script should launch a standalone Unreal Engine process
Default:
False
Const: True
Required: False
- --executable-path
-
Path to the standalone executable, when launching a standalone Environment
Type: str
Required: False
- --headless
-
Flag indicating if the standalone Unreal Engine process should run in headless mode
Default:
False
Const: True
Required: False
- -p, --port
-
Port to connect to the Unreal Engine process, if None an open port will be automatically selected when running standalone. Port is required if connecting to an existing Unreal Engine process.
Type: int
Required: False
- --map
-
Map to load when launching a standalone Unreal Engine process
Type: str
Required: False
- --fps
-
Fixed FPS to use when running standalone, if None no fixed timestep is used
Type: int
Required: False
- --disable-script
-
Flag indicating if the autolaunch script setting in the Unreal Engine Schola Plugin should be disabled. Useful for testing.
Default:
False
Const: True
Required: False
Logging Arguments
- --enable-tensorboard
-
Enable Tensorboard Logging
Default:
False
Const: True
Required: False
- --log-dir
-
Directory to save tensorboard logs, if enabled
Default:
'./logs'
Type: str
Required: False
- --log-freq
-
Frequency with which to log to Tensorboard, if enabled
Default:
10
Type: int
Required: False
- --callback-verbosity
-
Verbosity level for any Sb3 callback functions
Default:
0
Type: int
Required: False
- -scholav, --schola-verbosity
-
Verbosity level for Schola environment logs.
Default:
0
Type: int
Required: False
- -sb3v, --sb3-verbosity
-
Verbosity level for Stable Baselines3 logs.
Default:
1
Type: int
Required: False
Checkpoint Arguments
- --enable-checkpoints
-
Enable saving checkpoints
Default:
False
Const: True
Required: False
- --checkpoint-dir
-
Directory to save checkpoints
Default:
'C:\Users\alexcann\source\repos\ScholaExamples\Plugins\Schola\Docs\Sphinx/ckpt'
Type: str
Required: False
- --save-freq
-
Frequency with which to save checkpoints
Default:
100000
Type: int
Required: False
- --name-prefix
-
Override the name prefix for the checkpoint files (e.g. SAC, PPO, etc.)
Type: str
Required: False
- --export-onnx
-
Whether to export the model to ONNX format instead of just saving a checkpoint
Default:
False
Const: True
Required: False
- --save-final-policy
-
Whether to save the final policy after training is complete
Default:
False
Const: True
Required: False
- --save-replay-buffer
-
Save the replay buffer during training, if saving checkpoints
Default:
False
Const: True
Required: False
- --save-vecnormalize
-
Save the VecNormalize parameters during training, if saving checkpoints
Default:
False
Const: True
Required: False
Resume Arguments
- --resume-from
-
Path to a saved model to resume training from
Type: str
Required: False
- --load-vecnormalize
-
Path to a saved VecNormalize parameters to load, if resuming from a checkpoint
Type: str
Required: False
- --load-replay-buffer
-
Path to a saved Replay Buffer to load, if resuming from a checkpoint
Type: str
Required: False
- --reset-timestep
-
Reset the timestep counter to 0 when resuming from a checkpoint
Default:
False
Const: True
Required: False
Network Architecture Arguments
- --policy-parameters
-
Network architecture for the policy
Type: int
Required: False
- --critic-parameters
-
Network architecture for the critic. Either the Q-function or the Value-Function depending on algorithm.
Type: int
Required: False
- --activation
-
Activation function to use for the network
Default:
ActivationFunctionEnum.ReLU
Type: ActivationFunctionEnum
Required: False
Sub-commands
PPO
Proximal Policy Optimization
schola-sb3 PPO [-h] [--learning-rate LEARNING_RATE] [--n-steps N_STEPS] [--batch-size BATCH_SIZE] [--n-epochs N_EPOCHS] [--gamma GAMMA] [--gae-lambda GAE_LAMBDA]
[--clip-range CLIP_RANGE] [--normalize-advantage] [--ent-coef ENT_COEF] [--vf-coef VF_COEF] [--max-grad-norm MAX_GRAD_NORM] [--use-sde]
[--sde-sample-freq SDE_SAMPLE_FREQ]
optional arguments
- --learning-rate
-
The learning rate for the PPO algorithm
Default:
0.0003
Type: float
Required: False
- --n-steps
-
The number of steps to take in each environment before updating the policy
Default:
2048
Type: int
Required: False
- --batch-size
-
The number of samples to take from the replay buffer for each update
Default:
64
Type: int
Required: False
- --n-epochs
-
The number of epochs to train the policy for each update
Default:
10
Type: int
Required: False
- --gamma
-
The discount factor for the PPO algorithm
Default:
0.99
Type: float
Required: False
- --gae-lambda
-
The GAE lambda value for the PPO algorithm
Default:
0.95
Type: float
Required: False
- --clip-range
-
The clip range for the PPO algorithm
Default:
0.2
Type: float
Required: False
- --normalize-advantage
-
Whether to normalize the advantage function
Default:
False
Const: True
Required: False
- --ent-coef
-
The entropy coefficient for the PPO algorithm
Default:
0.0
Type: float
Required: False
- --vf-coef
-
The value function coefficient for the PPO algorithm
Default:
0.5
Type: float
Required: False
- --max-grad-norm
-
The maximum gradient norm for the PPO algorithm
Default:
0.5
Type: float
Required: False
- --use-sde
-
Whether to use the State Dependent Exploration for the PPO algorithm
Default:
False
Const: True
Required: False
- --sde-sample-freq
-
The frequency at which to sample from the SDE for the PPO algorithm
Default:
-1
Type: int
Required: False
SAC
Soft Actor-Critic
schola-sb3 SAC [-h] [--learning-rate LEARNING_RATE] [--buffer-size BUFFER_SIZE] [--learning-starts LEARNING_STARTS] [--batch-size BATCH_SIZE] [--tau TAU]
[--gamma GAMMA] [--train-freq TRAIN_FREQ] [--gradient-steps GRADIENT_STEPS] [--optimize-memory-usage] [--ent-coef ENT_COEF]
[--target-update-interval TARGET_UPDATE_INTERVAL] [--target-entropy TARGET_ENTROPY] [--use-sde] [--sde-sample-freq SDE_SAMPLE_FREQ]
optional arguments
- --learning-rate
-
The learning rate for the SAC algorithm
Default:
0.0003
Type: float
Required: False
- --buffer-size
-
The size of the replay buffer for the SAC algorithm
Default:
1000000
Type: int
Required: False
- --learning-starts
-
The number of steps to take before starting to learn with the SAC algorithm
Default:
100
Type: int
Required: False
- --batch-size
-
The number of samples to take from the replay buffer for each update
Default:
256
Type: int
Required: False
- --tau
-
The tau value for the SAC algorithm
Default:
0.005
Type: float
Required: False
- --gamma
-
The discount factor for the SAC algorithm
Default:
0.99
Type: float
Required: False
- --train-freq
-
The frequency at which to train the policy for the SAC algorithm
Default:
1
Type: int
Required: False
- --gradient-steps
-
The number of gradient steps to take for the SAC algorithm
Default:
1
Type: int
Required: False
- --optimize-memory-usage
-
Whether to optimize memory usage for the SAC algorithm
Default:
False
Const: True
Required: False
- --ent-coef
-
The entropy coefficient for the SAC algorithm
Default:
'auto'
Type: str
Required: False
- --target-update-interval
-
The frequency at which to update the target network for the SAC algorithm
Default:
1
Type: int
Required: False
- --target-entropy
-
The target entropy for the SAC algorithm
Default:
'auto'
Type: str
Required: False
- --use-sde
-
Whether to use the State Dependent Exploration for the SAC algorithm
Default:
False
Const: True
Required: False
- --sde-sample-freq
-
The frequency at which to sample from the SDE for the SAC algorithm
Default:
-1
Type: int
Required: False