Skip to content

schola-sb3 Command

This script trains Stable Baselines3 models using Schola with various configuration options for training, checkpointing, and network architecture.

Usage

Terminal window
usage: schola-sb3 [-h] [--launch-unreal] [--executable-path EXECUTABLE_PATH] [--headless] [-p PORT] [--map MAP] [--fps FPS] [--disable-script]
[-scholav SCHOLA_VERBOSITY] [--enable-checkpoints] [--checkpoint-dir CHECKPOINT_DIR] [--save-freq SAVE_FREQ] [--name-prefix NAME_PREFIX]
[--export-onnx] [--save-final-policy] [--save-replay-buffer] [--save-vecnormalize] [--resume-from RESUME_FROM] [--load-vecnormalize LOAD_VECNORMALIZE]
[--load-replay-buffer LOAD_REPLAY_BUFFER] [--reset-timestep] [--policy-parameters POLICY_PARAMETERS [POLICY_PARAMETERS ...]]
[--critic-parameters CRITIC_PARAMETERS [CRITIC_PARAMETERS ...]]
{PPO,SAC} ...

Optional Arguments

Standard command-line arguments for the schola-sb3 script.

Unreal Process Arguments

--launch-unreal

Launch Unreal Engine automatically

  • Default: False
  • Required: False

--executable-path

Path to the Unreal Engine executable

  • Type: str
  • Required: False

--headless

Run Unreal Engine in headless mode

  • Default: False
  • Required: False0

-p, --port

Port for Unreal Engine communication

  • Default: 15151
  • Type: int
  • Required: False

--map

Map to load in Unreal Engine

  • Type: str
  • Required: False

--fps

Target FPS for Unreal Engine

  • Default: 60
  • Type: int
  • Required: False

--disable-script

Disable script execution in Unreal Engine

  • Default: False
  • Required: False

Logging Arguments

-scholav, --schola-verbosity

Verbosity level for the Schola environment

  • Default: 0
  • Type: int
  • Required: False

Checkpoint Arguments

--enable-checkpoints

Enable saving checkpoints

  • Default: False
  • Required: False

--checkpoint-dir

Directory to save checkpoints

  • Default: './ckpt'
  • Type: str
  • Required: False

--save-freq

Frequency with which to save checkpoints

  • Default: 100000
  • Type: int
  • Required: False

--name-prefix

Override the name prefix for the checkpoint files (e.g. SAC, PPO, etc.)

  • Type: str
  • Required: False

--export-onnx

Whether to export the model to ONNX format instead of just saving a checkpoint

  • Default: False
  • Required: False

--save-final-policy

Whether to save the final policy after training is complete

  • Default: False
  • Required: False

--save-replay-buffer

Save the replay buffer during training, if saving checkpoints

  • Default: False
  • Required: False

--save-vecnormalize

Save the VecNormalize parameters during training, if saving checkpoints

  • Default: False
  • Required: False

Resume Arguments

--resume-from

Path to a saved model to resume training from

  • Type: str
  • Required: False

--load-vecnormalize

Path to a saved VecNormalize parameters to load, if resuming from a checkpoint

  • Type: str
  • Required: False

--load-replay-buffer

Path to a saved Replay Buffer to load, if resuming from a checkpoint

  • Type: str
  • Required: False

--reset-timestep

Reset the timestep counter to 0 when resuming from a checkpoint

  • Default: False
  • Required: False

Network Architecture Arguments

--policy-parameters

Network architecture for the policy

  • Type: int (multiple values allowed)
  • Required: False

--critic-parameters

Network architecture for the critic

  • Type: int (multiple values allowed)
  • Required: False

Sub-commands

PPO

Proximal Policy Optimization

Terminal window
schola-sb3 PPO [-h] [--learning-rate LEARNING_RATE] [--n-steps N_STEPS] [--batch-size BATCH_SIZE] [--n-epochs N_EPOCHS] [--gamma GAMMA] [--gae-lambda GAE_LAMBDA]
[--clip-range CLIP_RANGE] [--normalize-advantage] [--ent-coef ENT_COEF] [--vf-coef VF_COEF] [--max-grad-norm MAX_GRAD_NORM] [--use-sde]
[--sde-sample-freq SDE_SAMPLE_FREQ]

Optional Arguments

--learning-rate

The learning rate for the PPO algorithm

  • Default: 0.0003
  • Type: float
  • Required: False

Required: False

--uffer-size

The size of the replay buffer for the SAC algorithm

  • Default: 1000000
  • Type: int
  • Required: False

--learning-starts

The number of steps to take in each environment before updating the policy

  • Default: 2048
  • **Type:**int
  • Required: False

--batch-size

The number of samples to take from the replay buffer for each update

  • Default: 64
  • **Type:**int
  • Required: False

--n-epochs

The number of epochs to train the policy for each update

  • Default: 10
  • **Type:**int
  • Required: False

--gamma

The discount factor for the PPO algorithm

  • Default: 0.99
  • Type: float
  • Required: False

--gae-lambda

The GAE lambda value for the PPO algorithm

  • Default: 0.95
  • Type: float
  • Required: False

--clip-range

The clip range for the PPO algorithm

  • Default: 0.2
  • Type: float
  • Required: False

--normalize-advantage

Whether to normalize the advantage function

  • Default: False
  • Const: True
  • Required: False

--ent-coef

The entropy coefficient for the PPO algorithm

  • Default: 0.0
  • Type: float
  • Required: False

--vf-coef

The value function coefficient for the PPO algorithm

  • Default: 0.5
  • Type: float
  • Required: False

--max-grad-norm

The maximum gradient norm for the PPO algorithm

  • Default: 0.5
  • Type: float
  • Required: False

--use-sde

Whether to use the State Dependent Exploration for the PPO algorithm

  • Default: False
  • Const: True
  • Required: False

--sde-sample-freq

The frequency at which to sample from the SDE for the PPO algorithm

  • Default: -1
  • Type: int
  • Required: False

SAC

Soft Actor-Critic

Terminal window
schola-sb3 SAC [-h] [--learning-rate LEARNING_RATE] [--buffer-size BUFFER_SIZE] [--learning-starts LEARNING_STARTS] [--batch-size BATCH_SIZE] [--tau TAU] [--gamma GAMMA]
[--train-freq TRAIN_FREQ] [--gradient-steps GRADIENT_STEPS] [--optimize-memory-usage] [--ent-coef ENT_COEF] [--target-update-interval TARGET_UPDATE_INTERVAL]
[--target-entropy TARGET_ENTROPY] [--use-sde] [--sde-sample-freq SDE_SAMPLE_FREQ]

Optional Arguments

--learning-rate

The learning rate for the SAC algorithm

  • Default: 0.0003
  • Type: float
  • Required: False

--buffer-size

The size of the replay buffer for the SAC algorithm

  • Default: 1000000
  • Type: int
  • Required: False

--learning-starts

The number of steps to take before starting to learn with the SAC algorithm

  • Default: 100
  • Type: int
  • Required: False

--batch-size

The number of samples to take from the replay buffer for each update

  • Default: 256
  • Type: int
  • Required: False

--tau

The tau value for the SAC algorithm

  • Default: 0.005
  • Type: float
  • Required: False

--gamma

The discount factor for the SAC algorithm

  • Default: 0.99
  • Type: float
  • Required: False

--train-freq

The frequency at which to train the policy for the SAC algorithm

  • Default: 1
  • Type: int
  • Required: False

--gradient-steps

The number of gradient steps to take for the SAC algorithm

  • Default: 1
  • Type: int
  • Required: False

--optimize-memory-usage

Whether to optimize memory usage for the SAC algorithm

  • Default: False
  • Const: True
  • Required: False

--ent-coef

The entropy coefficient for the SAC algorithm

  • Default: ‘auto’
  • Type: str
  • Required: False

--target-update-interval

The frequency at which to update the target network for the SAC algorithm

  • Default: 1
  • Type: int
  • Required: False

--target-entropy

The target entropy for the SAC algorithm

  • Default: ‘auto’
  • Type: str
  • Required: False

--use-sde

Whether to use the State Dependent Exploration for the SAC algorithm

  • Default: False
  • Const: True
  • Required: False

--sde-sample-freq

The frequency at which to sample from the SDE for the SAC algorithm

  • Default: -1
  • Type: int
  • Required: False