Skip to content

FSB3PPOSettings

A struct to hold PPO settings for an SB3 training script.

struct FSB3PPOSettings : public FTrainingSettings

Methods

GenerateTrainingArgs

virtual void GenerateTrainingArgs(FScriptArgBuilder &ArgBuilder) const

Generate the training arguments for the script using the ArgBuilder.

Parameters

  • ArgBuilder (FScriptArgBuilder)

FSB3PPOSettings

virtual ~FSB3PPOSettings()

Attributes

LearningRate

float LearningRate = 0.0003

The learning rate for the PPO algorithm.


NSteps

int NSteps = 2048

The number of steps to take between training steps.


BatchSize

int BatchSize = 64

The batch size to use during gradient descent.


NEpochs

int NEpochs = 10

The number of epochs to train for each training step.


Gamma

float Gamma = 0.99

The gamma value for the PPO algorithm.


GAELambda

float GAELambda = 0.95

The Generalized Advantage Estimate Lambda value for the PPO algorithm.


ClipRange

float ClipRange = 0.2

The clip range for the PPO algorithm.


NormalizeAdvantage

bool NormalizeAdvantage = true

Should we normalize the advantage values.


EntCoef

float EntCoef = 0.0

The entropy coefficient for the PPO algorithm.


VFCoef

float VFCoef = 0.05

The value function coefficient for the PPO algorithm.


MaxGradNorm

float MaxGradNorm = 0.5

The maximum gradient norm for the PPO algorithm.


UseSDE

bool UseSDE = false

Should we use state dependent entropy noise.


SDESampleFreq

int SDESampleFreq = -1

The frequency to sample the state dependent entropy noise.

Source: Source/ScholaTraining/Public/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.h