SACSettings
Full path:
schola.scripts.rllib.settings.SACSettings
schola.scripts.rllib.settings.SACSettings
SACSettings
SACSettings(tau=0.005, target_entropy="auto", initial_alpha=1.0, n_step=1, twin_q=True)Bases: RLLibAlgorithmSpecificSettings
Dataclass for SAC (Soft Actor-Critic) algorithm specific settings. This class defines the parameters used in the SAC algorithm, including soft target network updates and entropy regularization.
Methods
| Item | Description |
|---|---|
| init | — |
get_parser() | Add the settings to the parser or subparser |
| get_settings_dict | Get the settings as a dictionary keyed by the correct parameter name in Ray |
Attributes
| Item | Description |
|---|---|
| initial_alpha | Initial temperature/alpha value for entropy regularization. |
| n_step | Number of steps for n-step returns. |
| name | — |
| rllib_config | — |
| target_entropy | Target entropy for automatic temperature tuning. |
| tau | Soft update coefficient for target networks. |
| twin_q | Whether to use twin Q networks (double Q-learning). |
Parameters
tau (Annotated[float, Parameter(validator=(Number(lt=None, lte=1.0, gt=None, gte=0.0, modulo=None),))])
target_entropy (str)
initial_alpha (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=0, gte=None, modulo=None),))])
n_step (Annotated[int, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=1, modulo=None),))])
twin_q (bool)
init
__init__(tau=0.005, target_entropy="auto", initial_alpha=1.0, n_step=1, twin_q=True)Parameters
tau (Annotated[float, Parameter(validator=(Number(lt=None, lte=1.0, gt=None, gte=0.0, modulo=None),))])
target_entropy (str)
initial_alpha (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=0, gte=None, modulo=None),))])
n_step (Annotated[int, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=1, modulo=None),))])
twin_q (bool)
Returns
None
get_settings_dict
get_settings_dict()Get the settings as a dictionary keyed by the correct parameter name in Ray
initial_alpha
= 1.0 initial_alpha: Annotated[float, Parameter(validator=Number(lt=None, lte=None, gt=0, gte=None, modulo=None))]Initial temperature/alpha value for entropy regularization. Higher values encourage more exploration.
n_step
= 1 n_step: Annotated[int, Parameter(validator=Number(lt=None, lte=None, gt=None, gte=1, modulo=None))]Number of steps for n-step returns. Using n>1 can help with credit assignment in sparse reward environments.
name
name: strrllib_config
rllib_config: Type[SACConfig]target_entropy
= 'auto' target_entropy: strTarget entropy for automatic temperature tuning. Set to ‘auto’ to automatically calculate based on action space dimensionality, or provide a float value for manual control.
tau
= 0.005 tau: Annotated[float, Parameter(validator=Number(lt=None, lte=1.0, gt=None, gte=0.0, modulo=None))]Soft update coefficient for target networks. Controls how quickly target networks track the main networks. Lower values (e.g., 0.005) mean slower updates, which can improve stability.
twin_q
= True twin_q: boolWhether to use twin Q networks (double Q-learning). This helps reduce overestimation bias in Q-value estimates.