IMPALASettings
Full path:
schola.scripts.rllib.settings.IMPALASettings
schola.scripts.rllib.settings.IMPALASettings
IMPALASettings
IMPALASettings( vtrace=True, vtrace_clip_rho_threshold=1.0, vtrace_clip_pg_rho_threshold=1.0)Bases: RLLibAlgorithmSpecificSettings
Dataclass for IMPALA (Importance Weighted Actor-Learner Architecture) algorithm specific settings. This class defines the parameters used in the IMPALA algorithm, including V-trace settings for off-policy correction.
Methods
| Item | Description |
|---|---|
| init | — |
get_parser() | Add the settings to the parser or subparser |
| get_settings_dict | Get the settings as a dictionary keyed by the correct parameter name in Ray |
Attributes
| Item | Description |
|---|---|
| name | — |
| rllib_config | — |
| vtrace | Whether to use the V-trace algorithm for off-policy correction in the IMPALA algorithm. |
| vtrace_clip_pg_rho_threshold | The clip threshold for V-trace rho values in the policy gradient. |
| vtrace_clip_rho_threshold | The clip threshold for V-trace rho values. |
Parameters
vtrace (bool)
vtrace_clip_rho_threshold (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=0, modulo=None),))])
vtrace_clip_pg_rho_threshold (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=0, modulo=None),))])
init
__init__(vtrace=True, vtrace_clip_rho_threshold=1.0, vtrace_clip_pg_rho_threshold=1.0)Parameters
vtrace (bool)
vtrace_clip_rho_threshold (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=0, modulo=None),))])
vtrace_clip_pg_rho_threshold (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=0, modulo=None),))])
Returns
None
get_settings_dict
get_settings_dict()Get the settings as a dictionary keyed by the correct parameter name in Ray
name
name: strrllib_config
rllib_config: Type[IMPALAConfig]vtrace
= True vtrace: boolWhether to use the V-trace algorithm for off-policy correction in the IMPALA algorithm. V-trace is a method to correct the bias introduced by using off-policy data for training. It helps to ensure that the value estimates are more accurate and stable.
vtrace_clip_pg_rho_threshold
= 1.0 vtrace_clip_pg_rho_threshold: Annotated[float, Parameter(validator=Number(lt=None, lte=None, gt=None, gte=0, modulo=None))]The clip threshold for V-trace rho values in the policy gradient.
vtrace_clip_rho_threshold
= 1.0 vtrace_clip_rho_threshold: Annotated[float, Parameter(validator=Number(lt=None, lte=None, gt=None, gte=0, modulo=None))]The clip threshold for V-trace rho values.