TrainingSettings
Full path:
schola.scripts.rllib.train.settings.TrainingSettings
Dataclass for generic training settings used in the RLlib training process. This class defines the parameters for training, including the number of timesteps, learning rate, minibatch size, and other hyperparameters that control the training process. These settings are applicable to any RLlib algorithm and can be customized based on the specific requirements of the training job.
TrainingSettings(timesteps = 3000, learning_rate = 0.0003, minibatch_size = 128, train_batch_size_per_learner = 256, num_epochs = 5, gamma = 0.99)Parameters
-
timesteps(Annotated) -
learning_rate(Annotated) -
minibatch_size(Annotated) -
train_batch_size_per_learner(Annotated) -
num_epochs(Annotated) -
gamma(Annotated)
Methods
init
__init__(timesteps = 3000, learning_rate = 0.0003, minibatch_size = 128, train_batch_size_per_learner = 256, num_epochs = 5, gamma = 0.99)Parameters
-
timesteps(Annotated) -
learning_rate(Annotated) -
minibatch_size(Annotated) -
train_batch_size_per_learner(Annotated) -
num_epochs(Annotated) -
gamma(Annotated)
Attributes
gamma
gammaThe discount factor for the reinforcement learning algorithm. This is used to calculate the present value of future rewards. A value of 0.99 means that future rewards are discounted by 1% for each time step into the future. This helps to balance the importance of immediate versus future rewards in the training process. A value closer to 1.0 will prioritize future rewards more heavily, while a value closer to 0 will prioritize immediate rewards.
learning_rate
learning_rateThe learning rate for any chosen algorithm. This controls how much to adjust the model weights in response to the estimated error each time the model weights are updated. A smaller value means slower learning, while a larger value means faster learning.
minibatch_size
minibatch_sizeThe size of the minibatch for training. This is the number of samples used in each iteration of training to update the model weights. A larger batch size can lead to more stable estimates of the gradient, but requires more memory and can slow down training if too large.
num_epochs
num_epochsThe number of training epochs for each batch. This is the number of passes to make over the whole training batch. More epochs can lead to better convergence, but also increases the training time. Alias for num_sgd_iter.
timesteps
timestepsStopping threshold for sampled environment steps (num_env_steps_sampled_lifetime). By default this is an absolute lifetime cap so the same command works for both fresh runs and -resume-from without hand-tuning totals. When -reset-timestep is set, this value is instead treated as additional steps to train beyond the checkpoint.
train_batch_size_per_learner
train_batch_size_per_learnerThe number of samples given to each learner during training. Must be divisible by minibatch_size.
name
name