PPOSettings

Full path: schola.scripts.rllib.settings.PPOSettings

schola.scripts.rllib.settings.PPOSettings

PPOSettings

PPOSettings(gae_lambda=0.95, clip_param=0.2, use_gae=True)

Bases: RLLibAlgorithmSpecificSettings

Dataclass for PPO (Proximal Policy Optimization) algorithm specific settings. This class defines the parameters used in the PPO algorithm, including GAE lambda, clip parameter, and whether to use GAE.

Methods

Item	Description
init	—
`get_parser`()	Add the settings to the parser or subparser
get_settings_dict	Get the settings as a dictionary keyed by the correct parameter name in Ray

Attributes

Item	Description
clip_param	The clip parameter for the PPO algorithm.
gae_lambda	The lambda parameter for Generalized Advantage Estimation (GAE).
name	—
rllib_config	—
use_gae	Whether to use Generalized Advantage Estimation (GAE) for advantage calculation.

Parameters

gae_lambda (Annotated[float, Parameter(validator=(Number(lt=None, lte=1.0, gt=None, gte=0.0, modulo=None),))])

clip_param (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=0, modulo=None),))])

use_gae (bool)

init

__init__(gae_lambda=0.95, clip_param=0.2, use_gae=True)

Parameters

gae_lambda (Annotated[float, Parameter(validator=(Number(lt=None, lte=1.0, gt=None, gte=0.0, modulo=None),))])

clip_param (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=0, modulo=None),))])

use_gae (bool)

Returns

None

clip_param

= 0.2 clip_param: Annotated[float, Parameter(validator=Number(lt=None, lte=None, gt=None, gte=0, modulo=None))]

The clip parameter for the PPO algorithm. This is the epsilon value used in the clipped surrogate objective function. It helps to limit the policy update step size to prevent large changes that could lead to performance collapse.

gae_lambda

= 0.95 gae_lambda: Annotated[float, Parameter(validator=Number(lt=None, lte=1.0, gt=None, gte=0.0, modulo=None))]

The lambda parameter for Generalized Advantage Estimation (GAE). This controls the trade-off between bias and variance in the advantage estimation.

get_settings_dict

get_settings_dict()

Get the settings as a dictionary keyed by the correct parameter name in Ray

name

name: str

rllib_config

rllib_config: Type[PPOConfig]

use_gae

= True use_gae: bool

Whether to use Generalized Advantage Estimation (GAE) for advantage calculation. GAE is a method to reduce the variance of the advantage estimates while keeping bias low. If set to False, the standard advantage calculation will be used instead.