schola.scripts.sb3.settings.SACSettings

class schola.scripts.sb3.settings.SACSettings(learning_rate=0.0003, buffer_size=1000000, learning_starts=100, batch_size=256, tau=0.005, gamma=0.99, train_freq=1, gradient_steps=1, action_noise=None, replay_buffer_class=None, replay_buffer_kwargs=None, optimize_memory_usage=False, ent_coef=‘auto’, target_update_interval=1, target_entropy=‘auto’, use_sde=False, sde_sample_freq=-1)[source]

Bases: object

Dataclass for configuring the settings of the Soft Actor-Critic (SAC) algorithm. This includes parameters for the learning process, such as learning rate, buffer size, batch size, and other hyperparameters that control the behavior of the SAC algorithm.

Methods

__init__([learning_rate, buffer_size, …])

Attributes

action_noise

Action noise to use for exploration.

batch_size

Minibatch size for each update.

buffer_size

Size of the replay buffer.

constructor

critic_type

ent_coef

Coefficient for the entropy term in the loss function.

gamma

Discount factor for future rewards.

gradient_steps

Number of gradient steps to take during each training update.

learning_rate

Learning rate for the optimizer.

learning_starts

Number of timesteps before learning starts.

name

optimize_memory_usage

Whether to optimize memory usage for the replay buffer.

replay_buffer_class

Class to use for the replay buffer.

replay_buffer_kwargs

Additional keyword arguments to pass to the replay buffer constructor.

sde_sample_freq

Frequency at which to sample the SDE noise.

target_entropy

Target entropy for the entropy regularization.

target_update_interval

Interval for updating the target networks.

tau

Soft update parameter for the target networks.

train_freq

Frequency of training the policy.

use_sde

Whether to use State Dependent Exploration (SDE).

Parameters:
  • learning_rate (float)

  • buffer_size (int)

  • learning_starts (int)

  • batch_size (int)

  • tau (float)

  • gamma (float)

  • train_freq (int)

  • gradient_steps (int)

  • action_noise (Any)

  • replay_buffer_class (Any)

  • replay_buffer_kwargs (dict)

  • optimize_memory_usage (bool)

  • ent_coef (Any)

  • target_update_interval (int)

  • target_entropy (Any)

  • use_sde (bool)

  • sde_sample_freq (int)

__init__(learning_rate=0.0003, buffer_size=1000000, learning_starts=100, batch_size=256, tau=0.005, gamma=0.99, train_freq=1, gradient_steps=1, action_noise=None, replay_buffer_class=None, replay_buffer_kwargs=None, optimize_memory_usage=False, ent_coef=‘auto’, target_update_interval=1, target_entropy=‘auto’, use_sde=False, sde_sample_freq=-1)
Parameters:
  • learning_rate (float)

  • buffer_size (int)

  • learning_starts (int)

  • batch_size (int)

  • tau (float)

  • gamma (float)

  • train_freq (int)

  • gradient_steps (int)

  • action_noise (Any | None)

  • replay_buffer_class (Any | None)

  • replay_buffer_kwargs (dict | None)

  • optimize_memory_usage (bool)

  • ent_coef (Any)

  • target_update_interval (int)

  • target_entropy (Any)

  • use_sde (bool)

  • sde_sample_freq (int)

Return type:

None

action_noise: Any = None

Action noise to use for exploration. This can be a callable function or a noise process (e.g., Ornstein-Uhlenbeck) that adds noise to the actions taken by the policy to encourage exploration. This is important in continuous action spaces to help the agent explore different actions and avoid getting stuck in local optima. If set to None, no noise will be added to the actions.

batch_size: int = 256

Minibatch size for each update. This is the number of samples drawn from the replay buffer to perform a single update to the policy. A larger batch size can lead to more stable updates but requires more memory. Must be less than or equal to buffer_size.

buffer_size: int = 1000000

Size of the replay buffer. This is the number of transitions (state, action, reward, next state) that can be stored in the buffer. A larger buffer allows for more diverse samples to be used for training, which can improve performance but also increases memory usage.

property constructor: Type[SAC]
property critic_type: str
ent_coef: Any = ‘auto’

Coefficient for the entropy term in the loss function. This encourages exploration by adding a penalty for certainty in the policy’s action distribution. A higher value will encourage more exploration, while a lower value will make the policy more deterministic. When set to ‘auto’, it will automatically adjust the coefficient based on the average entropy of the actions taken by the policy. This can help to balance exploration and exploitation during training.

gamma: float = 0.99

Discount factor for future rewards. This determines how much the agent values future rewards compared to immediate rewards. A value of 0.99 means that future rewards are discounted by 1% per time step. This is important for balancing the trade-off between short-term and long-term rewards in reinforcement learning.

gradient_steps: int = 1

Number of gradient steps to take during each training update. This specifies how many times to update the model parameters using the sampled minibatch from the replay buffer. A value of 1 means that the model is updated once per training step, while a higher value (e.g., 2) means that the model is updated multiple times. This can help to improve convergence but may also lead to overfitting if set too high.

learning_rate: float = 0.0003

Learning rate for the optimizer. This controls how much to adjust the model parameters in response to the estimated error each time the model weights are updated. A lower value means slower learning, while a higher value means faster learning.

learning_starts: int = 100

Number of timesteps before learning starts. This is the number of steps to collect in the replay buffer before the first update to the policy. This allows the agent to gather initial experience and helps to stabilize training by ensuring that there are enough samples to learn from.

property name: str
optimize_memory_usage: bool = False

Whether to optimize memory usage for the replay buffer. When set to True, it will use a more memory-efficient implementation of the replay buffer, which can help to reduce memory consumption during training. This is particularly useful when working with large environments or limited hardware resources. Note that this may slightly affect the performance of the training process, as it may introduce some overhead in accessing the samples.

replay_buffer_class: Any = None

Class to use for the replay buffer. This allows for customization of the replay buffer used for training. By default, it will use the standard ReplayBuffer class provided by Stable Baselines3. However, you can specify a custom class that inherits from ReplayBuffer to implement your own functionality or behavior for storing and sampling transitions.

replay_buffer_kwargs: dict = None

Additional keyword arguments to pass to the replay buffer constructor. This allows for further customization of the replay buffer’s behavior and settings when it is instantiated. For example, you can specify parameters like buffer_size, seed, or any other parameters supported by your custom replay buffer class. This can help to tailor the replay buffer to your specific needs or environment requirements.

sde_sample_freq: int = -1

Frequency at which to sample the SDE noise. This determines how often the noise is sampled when using State Dependent Exploration (SDE). A value of -1 means that it will sample the noise at every step, while a positive integer will specify the number of steps between samples. This can help to control the exploration behavior of the agent. A higher frequency can lead to more diverse exploration, while a lower frequency may lead to more stable but less exploratory behavior.

target_entropy: Any = ‘auto’

Target entropy for the entropy regularization. This is used to encourage exploration by setting a target for the average entropy of the actions taken by the policy. When set to ‘auto’, it will automatically calculate the target entropy based on the dimensionality of the action space (e.g., -dimensionality of the action space). This helps to balance exploration and exploitation during training by encouraging the agent to explore more diverse actions.

target_update_interval: int = 1

Interval for updating the target networks. This determines how often the target networks are updated with the main networks’ weights. A value of 1 means that the target networks are updated every training step, while a higher value (e.g., 2) means that they are updated every other step. This can help to control the stability of training by ensuring that the target networks are kept up-to-date with the latest policy parameters.

tau: float = 0.005

Soft update parameter for the target networks. This controls how much the target networks are updated towards the main networks during training. A smaller value (e.g., 0.005) means that the target networks are updated slowly, which can help to stabilize training. This is typically a small value between 0 and 1.

train_freq: int = 1

Frequency of training the policy. This determines how often the model is updated during training. A value of 1 means that the model is updated every time step, while a higher value (e.g., 2) means that the model is updated every other time step. This can help to control the trade-off between exploration and exploitation during training.

use_sde: bool = False

Whether to use State Dependent Exploration (SDE). This can help to improve exploration by adapting the exploration noise based on the current state of the environment. When set to True, it will use SDE for exploration instead of the standard exploration strategy. This can lead to more efficient exploration in complex environments, but may also introduce additional computational overhead.

Related pages

  • Visit the Schola product page for download links and more information.

Looking for more documentation on GPUOpen?

AMD GPUOpen software blogs

Our handy software release blogs will help you make good use of our tools, SDKs, and effects, as well as sharing the latest features with new releases.

GPUOpen Manuals

Don’t miss our manual documentation! And if slide decks are what you’re after, you’ll find 100+ of our finest presentations here.

AMD GPUOpen Performance Guides

The home of great performance and optimization advice for AMD RDNAâ„¢ 2 GPUs, AMD Ryzenâ„¢ CPUs, and so much more.

Getting started: AMD GPUOpen software

New or fairly new to AMD’s tools, libraries, and effects? This is the best place to get started on GPUOpen!

AMD GPUOpen Getting Started Development and Performance

Looking for tips on getting started with developing and/or optimizing your game, whether on AMD hardware or generally? We’ve got you covered!

AMD GPUOpen Technical blogs

Browse our technical blogs, and find valuable advice on developing with AMD hardware, ray tracing, Vulkan®, DirectX®, Unreal Engine, and lots more.

Find out more about our software!

AMD GPUOpen Effects - AMD FidelityFX technologies

Create wonder. No black boxes. Meet the AMD FidelityFX SDK!

AMD GPUOpen Samples

Browse all our useful samples. Perfect for when you’re needing to get started, want to integrate one of our libraries, and much more.

AMD GPUOpen developer SDKs

Discover what our SDK technologies can offer you. Query hardware or software, manage memory, create rendering applications or machine learning, and much more!

AMD GPUOpen Developer Tools

Analyze, Optimize, Profile, Benchmark. We provide you with the developer tools you need to make sure your game is the best it can be!