FSB3PPOSettings

A struct to hold PPO settings for an SB3 training script.

struct FSB3PPOSettings : public FTrainingSettings

Methods

virtual void GenerateTrainingArgs(FScriptArgBuilder &ArgBuilder) const

Generate the training arguments for the script using the ArgBuilder.

virtual ~FSB3PPOSettings()

float LearningRate = 0.0003

The learning rate for the PPO algorithm.

int NSteps = 2048

The number of steps to take between training steps.

int BatchSize = 64

The batch size to use during gradient descent.

int NEpochs = 10

The number of epochs to train for each training step.

float Gamma = 0.99

The gamma value for the PPO algorithm.

float GAELambda = 0.95

The Generalized Advantage Estimate Lambda value for the PPO algorithm.

float ClipRange = 0.2

The clip range for the PPO algorithm.

bool NormalizeAdvantage = true

Should we normalize the advantage values.

float EntCoef = 0.0

The entropy coefficient for the PPO algorithm.

float VFCoef = 0.05

The value function coefficient for the PPO algorithm.

float MaxGradNorm = 0.5

The maximum gradient norm for the PPO algorithm.

bool UseSDE = false

Should we use state dependent entropy noise.

int SDESampleFreq = -1

The frequency to sample the state dependent entropy noise.

Source: Source/ScholaTraining/Public/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.h