Skip to content

Struct FSB3PPOSettings

Struct FSB3PPOSettings

  • Defined in File SB3PPOSettings.h

Inheritance Relationships

Base Type

struct FSB3PPOSettings : public FTrainingSettings

Source: Source/Schola/Training/Public/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.h

Dependencies: FScriptArgBuilder

A struct to hold PPO settings for an SB3 training script.


Public Functions

SymbolDetails
GenerateTrainingArgsGenerate the training arguments for the script using the ArgBuilder.
~FSB3PPOSettings

GenerateTrainingArgs

virtual void GenerateTrainingArgs(FScriptArgBuilder &ArgBuilder) const

Generate the training arguments for the script using the ArgBuilder.

Populates the ArgBuilder with training-specific command-line arguments.

Parameters

ArgBuilder – [inout] The builder to use to generate the arguments.

#DirectionNameTypeDescription
1ArgBuilderFScriptArgBuilder &The builder to use to generate the arguments.

Attributes: const, virtual

Source: Source/Schola/Training/Public/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.h

Implementation: Source/Schola/Training/Private/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.cpp


~FSB3PPOSettings

virtual ~FSB3PPOSettings()

Attributes: virtual

Source: Source/Schola/Training/Public/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.h

Implementation: Source/Schola/Training/Private/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.cpp


Public Members

SymbolDetails
LearningRateThe learning rate for the PPO algorithm.
NStepsThe number of steps to take between training steps.
BatchSizeThe batch size to use during gradient descent.
NEpochsThe number of epochs to train for each training step.
GammaThe gamma value for the PPO algorithm.
GAELambdaThe Generalized Advantage Estimate Lambda value for the PPO algorithm.
ClipRangeThe clip range for the PPO algorithm.
NormalizeAdvantageShould we normalize the advantage values.
EntCoefThe entropy coefficient for the PPO algorithm.
VFCoefThe value function coefficient for the PPO algorithm.
MaxGradNormThe maximum gradient norm for the PPO algorithm.
UseSDEShould we use state dependent entropy noise.
SDESampleFreqThe frequency to sample the state dependent entropy noise.

LearningRate

float LearningRate = 0.0003

The learning rate for the PPO algorithm.


NSteps

int NSteps = 2048

The number of steps to take between training steps.


BatchSize

int BatchSize = 64

The batch size to use during gradient descent.


NEpochs

int NEpochs = 10

The number of epochs to train for each training step.


Gamma

float Gamma = 0.99

The gamma value for the PPO algorithm.


GAELambda

float GAELambda = 0.95

The Generalized Advantage Estimate Lambda value for the PPO algorithm.


ClipRange

float ClipRange = 0.2

The clip range for the PPO algorithm.


NormalizeAdvantage

bool NormalizeAdvantage = true

Should we normalize the advantage values.


EntCoef

float EntCoef = 0.0

The entropy coefficient for the PPO algorithm.


VFCoef

float VFCoef = 0.05

The value function coefficient for the PPO algorithm.


MaxGradNorm

float MaxGradNorm = 0.5

The maximum gradient norm for the PPO algorithm.


UseSDE

bool UseSDE = false

Should we use state dependent entropy noise.


SDESampleFreq

int SDESampleFreq = -1

The frequency to sample the state dependent entropy noise.