Struct FSB3PPOSettings

Defined in File SB3PPOSettings.h

Inheritance Relationships

Base Type

public FTrainingSettings (Struct FTrainingSettings)

struct FSB3PPOSettings : public FTrainingSettings

Source: Source/Schola/Training/Public/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.h

Dependencies: FScriptArgBuilder

A struct to hold PPO settings for an SB3 training script.

Public Functions

Symbol	Details
`GenerateTrainingArgs`	Generate the training arguments for the script using the ArgBuilder.
`~FSB3PPOSettings`	—

GenerateTrainingArgs

virtual void GenerateTrainingArgs(FScriptArgBuilder &ArgBuilder) const

Generate the training arguments for the script using the ArgBuilder.

Populates the ArgBuilder with training-specific command-line arguments.

Parameters

ArgBuilder – [inout] The builder to use to generate the arguments.

#	Direction	Name	Type	Description
1	—	`ArgBuilder`	`FScriptArgBuilder &`	The builder to use to generate the arguments.

Attributes: const, virtual

Source: Source/Schola/Training/Public/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.h

Implementation: Source/Schola/Training/Private/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.cpp

`~FSB3PPOSettings`

virtual ~FSB3PPOSettings()

Attributes: virtual

Source: Source/Schola/Training/Public/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.h

Implementation: Source/Schola/Training/Private/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.cpp

Public Members

Symbol	Details
`LearningRate`	The learning rate for the PPO algorithm.
`NSteps`	The number of steps to take between training steps.
`BatchSize`	The batch size to use during gradient descent.
`NEpochs`	The number of epochs to train for each training step.
`Gamma`	The gamma value for the PPO algorithm.
`GAELambda`	The Generalized Advantage Estimate Lambda value for the PPO algorithm.
`ClipRange`	The clip range for the PPO algorithm.
`NormalizeAdvantage`	Should we normalize the advantage values.
`EntCoef`	The entropy coefficient for the PPO algorithm.
`VFCoef`	The value function coefficient for the PPO algorithm.
`MaxGradNorm`	The maximum gradient norm for the PPO algorithm.
`UseSDE`	Should we use state dependent entropy noise.
`SDESampleFreq`	The frequency to sample the state dependent entropy noise.

`LearningRate`

float LearningRate = 0.0003

The learning rate for the PPO algorithm.

`NSteps`

int NSteps = 2048

The number of steps to take between training steps.

`BatchSize`

int BatchSize = 64

The batch size to use during gradient descent.

`NEpochs`

int NEpochs = 10

The number of epochs to train for each training step.

`Gamma`

float Gamma = 0.99

The gamma value for the PPO algorithm.

`GAELambda`

float GAELambda = 0.95

The Generalized Advantage Estimate Lambda value for the PPO algorithm.

`ClipRange`

float ClipRange = 0.2

The clip range for the PPO algorithm.

`NormalizeAdvantage`

bool NormalizeAdvantage = true

Should we normalize the advantage values.

`EntCoef`

float EntCoef = 0.0

The entropy coefficient for the PPO algorithm.

`VFCoef`

float VFCoef = 0.05

The value function coefficient for the PPO algorithm.

`MaxGradNorm`

float MaxGradNorm = 0.5

The maximum gradient norm for the PPO algorithm.

`UseSDE`

bool UseSDE = false

Should we use state dependent entropy noise.

`SDESampleFreq`

int SDESampleFreq = -1

The frequency to sample the state dependent entropy noise.