schola-sb3 Command
This script trains Stable Baselines3 models using Schola with various configuration options for training, checkpointing, and network architecture.
Usage
usage: schola-sb3 [-h] [--launch-unreal] [--executable-path EXECUTABLE_PATH] [--headless] [-p PORT] [--map MAP] [--fps FPS] [--disable-script] [-scholav SCHOLA_VERBOSITY] [--enable-checkpoints] [--checkpoint-dir CHECKPOINT_DIR] [--save-freq SAVE_FREQ] [--name-prefix NAME_PREFIX] [--export-onnx] [--save-final-policy] [--save-replay-buffer] [--save-vecnormalize] [--resume-from RESUME_FROM] [--load-vecnormalize LOAD_VECNORMALIZE] [--load-replay-buffer LOAD_REPLAY_BUFFER] [--reset-timestep] [--policy-parameters POLICY_PARAMETERS [POLICY_PARAMETERS ...]] [--critic-parameters CRITIC_PARAMETERS [CRITIC_PARAMETERS ...]] {PPO,SAC} ...
Optional Arguments
Standard command-line arguments for the schola-sb3 script.
Unreal Process Arguments
--launch-unreal
Launch Unreal Engine automatically
- Default:
False
- Required: False
--executable-path
Path to the Unreal Engine executable
- Type: str
- Required: False
--headless
Run Unreal Engine in headless mode
- Default:
False
- Required: False0
-p, --port
Port for Unreal Engine communication
- Default:
15151
- Type: int
- Required: False
--map
Map to load in Unreal Engine
- Type: str
- Required: False
--fps
Target FPS for Unreal Engine
- Default:
60
- Type: int
- Required: False
--disable-script
Disable script execution in Unreal Engine
- Default:
False
- Required: False
Logging Arguments
-scholav, --schola-verbosity
Verbosity level for the Schola environment
- Default:
0
- Type: int
- Required: False
Checkpoint Arguments
--enable-checkpoints
Enable saving checkpoints
- Default:
False
- Required: False
--checkpoint-dir
Directory to save checkpoints
- Default:
'./ckpt'
- Type: str
- Required: False
--save-freq
Frequency with which to save checkpoints
- Default:
100000
- Type: int
- Required: False
--name-prefix
Override the name prefix for the checkpoint files (e.g. SAC, PPO, etc.)
- Type: str
- Required: False
--export-onnx
Whether to export the model to ONNX format instead of just saving a checkpoint
- Default:
False
- Required: False
--save-final-policy
Whether to save the final policy after training is complete
- Default:
False
- Required: False
--save-replay-buffer
Save the replay buffer during training, if saving checkpoints
- Default:
False
- Required: False
--save-vecnormalize
Save the VecNormalize parameters during training, if saving checkpoints
- Default:
False
- Required: False
Resume Arguments
--resume-from
Path to a saved model to resume training from
- Type: str
- Required: False
--load-vecnormalize
Path to a saved VecNormalize parameters to load, if resuming from a checkpoint
- Type: str
- Required: False
--load-replay-buffer
Path to a saved Replay Buffer to load, if resuming from a checkpoint
- Type: str
- Required: False
--reset-timestep
Reset the timestep counter to 0 when resuming from a checkpoint
- Default:
False
- Required: False
Network Architecture Arguments
--policy-parameters
Network architecture for the policy
- Type: int (multiple values allowed)
- Required: False
--critic-parameters
Network architecture for the critic
- Type: int (multiple values allowed)
- Required: False
Sub-commands
PPO
Proximal Policy Optimization
schola-sb3 PPO [-h] [--learning-rate LEARNING_RATE] [--n-steps N_STEPS] [--batch-size BATCH_SIZE] [--n-epochs N_EPOCHS] [--gamma GAMMA] [--gae-lambda GAE_LAMBDA] [--clip-range CLIP_RANGE] [--normalize-advantage] [--ent-coef ENT_COEF] [--vf-coef VF_COEF] [--max-grad-norm MAX_GRAD_NORM] [--use-sde] [--sde-sample-freq SDE_SAMPLE_FREQ]
Optional Arguments
--learning-rate
The learning rate for the PPO algorithm
- Default:
0.0003
- Type: float
- Required: False
Required: False
--uffer-size
The size of the replay buffer for the SAC algorithm
- Default:
1000000
- Type: int
- Required: False
--learning-starts
The number of steps to take in each environment before updating the policy
- Default:
2048
- **Type:**int
- Required: False
--batch-size
The number of samples to take from the replay buffer for each update
- Default:
64
- **Type:**int
- Required: False
--n-epochs
The number of epochs to train the policy for each update
- Default:
10
- **Type:**int
- Required: False
--gamma
The discount factor for the PPO algorithm
- Default:
0.99
- Type: float
- Required: False
--gae-lambda
The GAE lambda value for the PPO algorithm
- Default:
0.95
- Type: float
- Required: False
--clip-range
The clip range for the PPO algorithm
- Default:
0.2
- Type: float
- Required: False
--normalize-advantage
Whether to normalize the advantage function
- Default:
False
- Const:
True
- Required: False
--ent-coef
The entropy coefficient for the PPO algorithm
- Default:
0.0
- Type: float
- Required: False
--vf-coef
The value function coefficient for the PPO algorithm
- Default:
0.5
- Type: float
- Required: False
--max-grad-norm
The maximum gradient norm for the PPO algorithm
- Default:
0.5
- Type: float
- Required: False
--use-sde
Whether to use the State Dependent Exploration for the PPO algorithm
- Default:
False
- Const:
True
- Required: False
--sde-sample-freq
The frequency at which to sample from the SDE for the PPO algorithm
- Default:
-1
- Type: int
- Required: False
SAC
Soft Actor-Critic
schola-sb3 SAC [-h] [--learning-rate LEARNING_RATE] [--buffer-size BUFFER_SIZE] [--learning-starts LEARNING_STARTS] [--batch-size BATCH_SIZE] [--tau TAU] [--gamma GAMMA] [--train-freq TRAIN_FREQ] [--gradient-steps GRADIENT_STEPS] [--optimize-memory-usage] [--ent-coef ENT_COEF] [--target-update-interval TARGET_UPDATE_INTERVAL] [--target-entropy TARGET_ENTROPY] [--use-sde] [--sde-sample-freq SDE_SAMPLE_FREQ]
Optional Arguments
--learning-rate
The learning rate for the SAC algorithm
- Default:
0.0003
- Type: float
- Required: False
--buffer-size
The size of the replay buffer for the SAC algorithm
- Default:
1000000
- Type: int
- Required: False
--learning-starts
The number of steps to take before starting to learn with the SAC algorithm
- Default:
100
- Type: int
- Required: False
--batch-size
The number of samples to take from the replay buffer for each update
- Default:
256
- Type: int
- Required: False
--tau
The tau value for the SAC algorithm
- Default:
0.005
- Type: float
- Required: False
--gamma
The discount factor for the SAC algorithm
- Default:
0.99
- Type: float
- Required: False
--train-freq
The frequency at which to train the policy for the SAC algorithm
- Default:
1
- Type: int
- Required: False
--gradient-steps
The number of gradient steps to take for the SAC algorithm
- Default:
1
- Type: int
- Required: False
--optimize-memory-usage
Whether to optimize memory usage for the SAC algorithm
- Default:
False
- Const: True
- Required: False
--ent-coef
The entropy coefficient for the SAC algorithm
- Default: ‘auto’
- Type: str
- Required: False
--target-update-interval
The frequency at which to update the target network for the SAC algorithm
- Default:
1
- Type: int
- Required: False
--target-entropy
The target entropy for the SAC algorithm
- Default: ‘auto’
- Type: str
- Required: False
--use-sde
Whether to use the State Dependent Exploration for the SAC algorithm
- Default:
False
- Const:
True
- Required: False
--sde-sample-freq
The frequency at which to sample from the SDE for the SAC algorithm
- Default:
-1
- Type: int
- Required: False