FSB3PPOSettings
struct FSB3PPOSettings : public FTrainingSettings
A struct to hold PPO settings for an SB3 training script.
Note: This is a partial implementation of the PPO settings, and is not exhaustive
Dependencies: FScriptArgBuilder, FTrainingSettings
Inherits from: public FTrainingSettings
Public Interface
Destructor:
~FSB3PPOSettings
virtual ~FSB3PPOSettings()
Attributes: virtual
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 74, column 9)
Implementation: Schola/Source/Schola/Private/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.cpp
(lines 27-29)
Public Functions:
GenerateTrainingArgs
virtual void GenerateTrainingArgs(int Port, FScriptArgBuilder &ArgBuilder) const const
Generate the training arguments for the script using the ArgBuilder.
Note: port is supplied as it is a common argument to pass to scripts, and is set at a high level but might be needed by specific subsettings
Parameters:
Port
(int
) – [in] The port to use for the scriptArgBuilder
(FScriptArgBuilder &
) – [in] The builder to use to generate the arguments
Attributes: const
, virtual
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 72, column 6)
Implementation: Schola/Source/Schola/Private/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.cpp
(lines 6-25)
Public Members:
float LearningRate
float LearningRate = = 0.0003
The learning rate for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 22, column 7)
int NSteps
int NSteps = = 2048
The number of steps to take between training steps.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 26, column 5)
int BatchSize
int BatchSize = = 64
The batch size to use during gradient descent.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 30, column 5)
int NEpochs
int NEpochs = = 10
The number of epochs to train for each training step.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 34, column 5)
float Gamma
float Gamma = = 0.99
The gamma value for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 38, column 7)
float GAELambda
float GAELambda = = 0.95
The Generalized Advantage Estimate Lambda value for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 42, column 7)
float ClipRange
float ClipRange = = 0.2
The clip range for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 46, column 7)
bool NormalizeAdvantage
bool NormalizeAdvantage = = true
Should we normalize the advantage values.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 50, column 6)
float EntCoef
float EntCoef = = 0.0
The entropy coefficient for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 54, column 7)
float VFCoef
float VFCoef = = 0.05
The value function coefficient for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 58, column 7)
float MaxGradNorm
float MaxGradNorm = = 0.5
The maximum gradient norm for the PPO algorithm.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 62, column 7)
bool UseSDE
bool UseSDE = = false
Should we use state dependent entropy noise.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 66, column 6)
int SDESampleFreq
int SDESampleFreq = = -1
The frequency to sample the state dependent entropy noise.
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 70, column 5)
Used By: FSB3TrainingSettings
Source: Schola/Source/Schola/Public/Subsystem/SubsystemSettings/StableBaselines/Algorithms/SB3PPOSettings.h
(line 15, column 1)