IMPALASettings

Full path: schola.scripts.rllib.settings.IMPALASettings

schola.scripts.rllib.settings.IMPALASettings

IMPALASettings

IMPALASettings(
    vtrace=True, vtrace_clip_rho_threshold=1.0, vtrace_clip_pg_rho_threshold=1.0
)

Bases: RLLibAlgorithmSpecificSettings

Dataclass for IMPALA (Importance Weighted Actor-Learner Architecture) algorithm specific settings. This class defines the parameters used in the IMPALA algorithm, including V-trace settings for off-policy correction.

Methods

Item	Description
init	—
`get_parser`()	Add the settings to the parser or subparser
get_settings_dict	Get the settings as a dictionary keyed by the correct parameter name in Ray

Attributes

Item	Description
name	—
rllib_config	—
vtrace	Whether to use the V-trace algorithm for off-policy correction in the IMPALA algorithm.
vtrace_clip_pg_rho_threshold	The clip threshold for V-trace rho values in the policy gradient.
vtrace_clip_rho_threshold	The clip threshold for V-trace rho values.

Parameters

vtrace (bool)

vtrace_clip_rho_threshold (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=0, modulo=None),))])

vtrace_clip_pg_rho_threshold (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=0, modulo=None),))])

init

__init__(vtrace=True, vtrace_clip_rho_threshold=1.0, vtrace_clip_pg_rho_threshold=1.0)

Parameters

vtrace (bool)

vtrace_clip_rho_threshold (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=0, modulo=None),))])

vtrace_clip_pg_rho_threshold (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=0, modulo=None),))])

Returns

None

get_settings_dict

get_settings_dict()

Get the settings as a dictionary keyed by the correct parameter name in Ray

name

name: str

rllib_config

rllib_config: Type[IMPALAConfig]

vtrace

= True vtrace: bool

Whether to use the V-trace algorithm for off-policy correction in the IMPALA algorithm. V-trace is a method to correct the bias introduced by using off-policy data for training. It helps to ensure that the value estimates are more accurate and stable.

vtrace_clip_pg_rho_threshold

= 1.0 vtrace_clip_pg_rho_threshold: Annotated[float, Parameter(validator=Number(lt=None, lte=None, gt=None, gte=0, modulo=None))]

The clip threshold for V-trace rho values in the policy gradient.

vtrace_clip_rho_threshold

= 1.0 vtrace_clip_rho_threshold: Annotated[float, Parameter(validator=Number(lt=None, lte=None, gt=None, gte=0, modulo=None))]

The clip threshold for V-trace rho values.