Struct FSB3PPOSettings
Struct FSB3PPOSettings
- Defined in File SB3PPOSettings.h
Inheritance Relationships
Base Type
public FTrainingSettings(Struct FTrainingSettings)
struct FSB3PPOSettings : public FTrainingSettingsSource: Source/Schola/Training/Public/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.h
Dependencies: FScriptArgBuilder
A struct to hold PPO settings for an SB3 training script.
Public Functions
| Symbol | Details |
|---|---|
GenerateTrainingArgs | Generate the training arguments for the script using the ArgBuilder. |
~FSB3PPOSettings | — |
GenerateTrainingArgs
virtual void GenerateTrainingArgs(FScriptArgBuilder &ArgBuilder) constGenerate the training arguments for the script using the ArgBuilder.
Populates the ArgBuilder with training-specific command-line arguments.
Parameters
ArgBuilder – [inout] The builder to use to generate the arguments.
| # | Direction | Name | Type | Description |
|---|---|---|---|---|
| 1 | — | ArgBuilder | FScriptArgBuilder & | The builder to use to generate the arguments. |
Attributes: const, virtual
Source: Source/Schola/Training/Public/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.h
Implementation: Source/Schola/Training/Private/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.cpp
~FSB3PPOSettings
virtual ~FSB3PPOSettings()Attributes: virtual
Source: Source/Schola/Training/Public/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.h
Implementation: Source/Schola/Training/Private/TrainingSettings/StableBaselines/Algorithms/SB3PPOSettings.cpp
Public Members
| Symbol | Details |
|---|---|
LearningRate | The learning rate for the PPO algorithm. |
NSteps | The number of steps to take between training steps. |
BatchSize | The batch size to use during gradient descent. |
NEpochs | The number of epochs to train for each training step. |
Gamma | The gamma value for the PPO algorithm. |
GAELambda | The Generalized Advantage Estimate Lambda value for the PPO algorithm. |
ClipRange | The clip range for the PPO algorithm. |
NormalizeAdvantage | Should we normalize the advantage values. |
EntCoef | The entropy coefficient for the PPO algorithm. |
VFCoef | The value function coefficient for the PPO algorithm. |
MaxGradNorm | The maximum gradient norm for the PPO algorithm. |
UseSDE | Should we use state dependent entropy noise. |
SDESampleFreq | The frequency to sample the state dependent entropy noise. |
LearningRate
float LearningRate = 0.0003The learning rate for the PPO algorithm.
NSteps
int NSteps = 2048The number of steps to take between training steps.
BatchSize
int BatchSize = 64The batch size to use during gradient descent.
NEpochs
int NEpochs = 10The number of epochs to train for each training step.
Gamma
float Gamma = 0.99The gamma value for the PPO algorithm.
GAELambda
float GAELambda = 0.95The Generalized Advantage Estimate Lambda value for the PPO algorithm.
ClipRange
float ClipRange = 0.2The clip range for the PPO algorithm.
NormalizeAdvantage
bool NormalizeAdvantage = trueShould we normalize the advantage values.
EntCoef
float EntCoef = 0.0The entropy coefficient for the PPO algorithm.
VFCoef
float VFCoef = 0.05The value function coefficient for the PPO algorithm.
MaxGradNorm
float MaxGradNorm = 0.5The maximum gradient norm for the PPO algorithm.
UseSDE
bool UseSDE = falseShould we use state dependent entropy noise.
SDESampleFreq
int SDESampleFreq = -1The frequency to sample the state dependent entropy noise.