SACSettings

Full path: schola.scripts.rllib.settings.SACSettings

schola.scripts.rllib.settings.SACSettings

SACSettings

SACSettings(tau=0.005, target_entropy="auto", initial_alpha=1.0, n_step=1, twin_q=True)

Bases: RLLibAlgorithmSpecificSettings

Dataclass for SAC (Soft Actor-Critic) algorithm specific settings. This class defines the parameters used in the SAC algorithm, including soft target network updates and entropy regularization.

Methods

Item	Description
init	—
`get_parser`()	Add the settings to the parser or subparser
get_settings_dict	Get the settings as a dictionary keyed by the correct parameter name in Ray

Attributes

Item	Description
initial_alpha	Initial temperature/alpha value for entropy regularization.
n_step	Number of steps for n-step returns.
name	—
rllib_config	—
target_entropy	Target entropy for automatic temperature tuning.
tau	Soft update coefficient for target networks.
twin_q	Whether to use twin Q networks (double Q-learning).

Parameters

tau (Annotated[float, Parameter(validator=(Number(lt=None, lte=1.0, gt=None, gte=0.0, modulo=None),))])

target_entropy (str)

initial_alpha (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=0, gte=None, modulo=None),))])

n_step (Annotated[int, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=1, modulo=None),))])

twin_q (bool)

init

__init__(tau=0.005, target_entropy="auto", initial_alpha=1.0, n_step=1, twin_q=True)

Parameters

tau (Annotated[float, Parameter(validator=(Number(lt=None, lte=1.0, gt=None, gte=0.0, modulo=None),))])

target_entropy (str)

initial_alpha (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=0, gte=None, modulo=None),))])

n_step (Annotated[int, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=1, modulo=None),))])

twin_q (bool)

Returns

None

get_settings_dict

get_settings_dict()

Get the settings as a dictionary keyed by the correct parameter name in Ray

initial_alpha

= 1.0 initial_alpha: Annotated[float, Parameter(validator=Number(lt=None, lte=None, gt=0, gte=None, modulo=None))]

Initial temperature/alpha value for entropy regularization. Higher values encourage more exploration.

n_step

= 1 n_step: Annotated[int, Parameter(validator=Number(lt=None, lte=None, gt=None, gte=1, modulo=None))]

Number of steps for n-step returns. Using n>1 can help with credit assignment in sparse reward environments.

name

name: str

rllib_config

rllib_config: Type[SACConfig]

target_entropy

= 'auto' target_entropy: str

Target entropy for automatic temperature tuning. Set to ‘auto’ to automatically calculate based on action space dimensionality, or provide a float value for manual control.

tau

= 0.005 tau: Annotated[float, Parameter(validator=Number(lt=None, lte=1.0, gt=None, gte=0.0, modulo=None))]

Soft update coefficient for target networks. Controls how quickly target networks track the main networks. Lower values (e.g., 0.005) mean slower updates, which can improve stability.

twin_q

= True twin_q: bool

Whether to use twin Q networks (double Q-learning). This helps reduce overestimation bias in Q-value estimates.