schola.scripts.sb3.settings.PPOSettings

class schola.scripts.sb3.settings.PPOSettings(learning_rate=0.0003, n_steps=2048, batch_size=64, n_epochs=10, gamma=0.99, gae_lambda=0.95, clip_range=0.2, normalize_advantage=True, ent_coef=0.0, vf_coef=0.5, max_grad_norm=0.5, use_sde=False, sde_sample_freq=-1)[source]

Bases: object

Dataclass for configuring the settings of the Proximal Policy Optimization (PPO) algorithm. This includes parameters for the learning process, such as learning rate, batch size, number of steps, and other hyperparameters that control the behavior of the PPO algorithm.

Methods

__init__([learning_rate, n_steps, …])

Attributes

batch_size

Minibatch size for each update.

clip_range

Clipping range for the policy update.

constructor

critic_type

ent_coef

Coefficient for the entropy term in the loss function.

gae_lambda

Lambda parameter for Generalized Advantage Estimation (GAE).

gamma

Discount factor for future rewards.

learning_rate

Learning rate for the optimizer.

max_grad_norm

Maximum gradient norm for clipping.

n_epochs

Number of epochs to update the policy.

n_steps

Number of steps to run for each environment per update.

name

normalize_advantage

Whether to normalize the advantages.

sde_sample_freq

Frequency at which to sample the SDE noise.

use_sde

Whether to use State Dependent Exploration (SDE).

vf_coef

Coefficient for the value function loss in the overall loss function.

Parameters:
__init__(learning_rate=0.0003, n_steps=2048, batch_size=64, n_epochs=10, gamma=0.99, gae_lambda=0.95, clip_range=0.2, normalize_advantage=True, ent_coef=0.0, vf_coef=0.5, max_grad_norm=0.5, use_sde=False, sde_sample_freq=-1)
Parameters:
Return type:

None

batch_size: int = 64

Minibatch size for each update. This is the number of timesteps used in each batch for training the policy. Must be a divisor of n_steps.

clip_range: float = 0.2

Clipping range for the policy update. This is the maximum amount by which the new policy can differ from the old policy during training. This helps to prevent large updates that can destabilize training.

property constructor: Type[PPO]
property critic_type: str
ent_coef: float = 0.0

Coefficient for the entropy term in the loss function. This encourages exploration by adding a penalty for certainty in the policy’s action distribution. A higher value will encourage more exploration, while a lower value will make the policy more deterministic. Set to 0.0 to disable entropy regularization.

gae_lambda: float = 0.95

Lambda parameter for Generalized Advantage Estimation (GAE). This parameter helps to balance bias and variance in the advantage estimation. A value of 1.0 corresponds to standard advantage estimation, while lower values will reduce variance but may introduce bias.

gamma: float = 0.99

Discount factor for future rewards. This determines how much the agent values future rewards compared to immediate rewards. A value of 0.99 means that future rewards are discounted by 1% per time step.

learning_rate: float = 0.0003

Learning rate for the optimizer.

max_grad_norm: float = 0.5

Maximum gradient norm for clipping. This is used to prevent exploding gradients by scaling down the gradients if their norm exceeds this value. This can help to stabilize training, especially in environments with high variance in the rewards or gradients.

n_epochs: int = 10

Number of epochs to update the policy. This is the number of times the model will iterate over the collected data during training. More epochs can lead to better convergence but also overfitting.

n_steps: int = 2048

Number of steps to run for each environment per update. This is the number of timesteps collected before updating the policy.

property name: str
normalize_advantage: bool = True

Whether to normalize the advantages. Normalizing the advantages can help to stabilize training by ensuring that they have a mean of 0 and a standard deviation of 1. This can lead to more consistent updates to the policy.

sde_sample_freq: int = -1

Frequency at which to sample the SDE noise. This determines how often the noise is sampled when using State Dependent Exploration (SDE). A value of -1 means that it will sample the noise at every step, while a positive integer will specify the number of steps between samples. This can help to control the exploration behavior of the agent.

use_sde: bool = False

Whether to use State Dependent Exploration (SDE). This can help to improve exploration by adapting the exploration noise based on the current state of the environment. When set to True, it will use SDE for exploration instead of the standard exploration strategy.

vf_coef: float = 0.5

Coefficient for the value function loss in the overall loss function. This determines how much weight is given to the value function loss compared to the policy loss. A higher value will put more emphasis on accurately estimating the value function, while a lower value will prioritize the policy update.

Related pages

  • Visit the Schola product page for download links and more information.

Looking for more documentation on GPUOpen?

AMD GPUOpen software blogs

Our handy software release blogs will help you make good use of our tools, SDKs, and effects, as well as sharing the latest features with new releases.

GPUOpen Manuals

Don’t miss our manual documentation! And if slide decks are what you’re after, you’ll find 100+ of our finest presentations here.

AMD GPUOpen Performance Guides

The home of great performance and optimization advice for AMD RDNA™ 2 GPUs, AMD Ryzen™ CPUs, and so much more.

Getting started: AMD GPUOpen software

New or fairly new to AMD’s tools, libraries, and effects? This is the best place to get started on GPUOpen!

AMD GPUOpen Getting Started Development and Performance

Looking for tips on getting started with developing and/or optimizing your game, whether on AMD hardware or generally? We’ve got you covered!

AMD GPUOpen Technical blogs

Browse our technical blogs, and find valuable advice on developing with AMD hardware, ray tracing, Vulkan®, DirectX®, Unreal Engine, and lots more.

Find out more about our software!

AMD GPUOpen Effects - AMD FidelityFX technologies

Create wonder. No black boxes. Meet the AMD FidelityFX SDK!

AMD GPUOpen Samples

Browse all our useful samples. Perfect for when you’re needing to get started, want to integrate one of our libraries, and much more.

AMD GPUOpen developer SDKs

Discover what our SDK technologies can offer you. Query hardware or software, manage memory, create rendering applications or machine learning, and much more!

AMD GPUOpen Developer Tools

Analyze, Optimize, Profile, Benchmark. We provide you with the developer tools you need to make sure your game is the best it can be!