FSB3PPOSettings
struct FSB3PPOSettings : public FTrainingSettingsA struct to hold PPO settings for an SB3 training script.
Note: This is a partial implementation of the PPO settings, and is not exhaustive
Dependencies: FScriptArgBuilder, FTrainingSettings
Inherits from: public FTrainingSettings
Public Interface
Destructor:
~FSB3PPOSettings
virtual ~FSB3PPOSettings()Attributes: virtual
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 74, column 9)
Implementation: Schola/Source/Schola/Private/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.cpp (lines 27-29)
Public Functions:
GenerateTrainingArgs
virtual void GenerateTrainingArgs(int Port, FScriptArgBuilder &ArgBuilder) const constGenerate the training arguments for the script using the ArgBuilder.
Note: port is supplied as it is a common argument to pass to scripts, and is set at a high level but might be needed by specific subsettings
Parameters:
Port(int) – [in] The port to use for the scriptArgBuilder(FScriptArgBuilder &) – [in] The builder to use to generate the arguments
Attributes: const, virtual
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 72, column 6)
Implementation: Schola/Source/Schola/Private/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.cpp (lines 6-25)
Public Members:
float LearningRate
float LearningRate = = 0.0003The learning rate for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 22, column 7)
int NSteps
int NSteps = = 2048The number of steps to take between training steps.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 26, column 5)
int BatchSize
int BatchSize = = 64The batch size to use during gradient descent.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 30, column 5)
int NEpochs
int NEpochs = = 10The number of epochs to train for each training step.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 34, column 5)
float Gamma
float Gamma = = 0.99The gamma value for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 38, column 7)
float GAELambda
float GAELambda = = 0.95The Generalized Advantage Estimate Lambda value for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 42, column 7)
float ClipRange
float ClipRange = = 0.2The clip range for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 46, column 7)
bool NormalizeAdvantage
bool NormalizeAdvantage = = trueShould we normalize the advantage values.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 50, column 6)
float EntCoef
float EntCoef = = 0.0The entropy coefficient for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 54, column 7)
float VFCoef
float VFCoef = = 0.05The value function coefficient for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 58, column 7)
float MaxGradNorm
float MaxGradNorm = = 0.5The maximum gradient norm for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 62, column 7)
bool UseSDE
bool UseSDE = = falseShould we use state dependent entropy noise.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 66, column 6)
int SDESampleFreq
int SDESampleFreq = = -1The frequency to sample the state dependent entropy noise.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 70, column 5)
Used By: FSB3TrainingSettings
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h (line 15, column 1)