schola.sb3.action_space_patch.PatchedPPO

class schola.sb3.action_space_patch.PatchedPPO(policy, env, learning_rate=0.0003, n_steps=2048, batch_size=64, n_epochs=10, gamma=0.99, gae_lambda=0.95, clip_range=0.2, clip_range_vf=None, normalize_advantage=True, ent_coef=0.0, vf_coef=0.5, max_grad_norm=0.5, use_sde=False, sde_sample_freq=-1, target_kl=None, stats_window_size=100, tensorboard_log=None, policy_kwargs=None, verbose=0, seed=None, device=‘auto’, _init_setup_model=True) : Bases: PPO

Methods


`__init__`(policy, env[, learning_rate, …])
`collect_rollouts`(env, callback, …)	Collect experiences using the current policy and fill a `RolloutBuffer`.
`get_env`()	Returns the current environment (can be None if not defined).
`get_parameters`()	Return the parameters of the agent.
`get_vec_normalize_env`()	Return the `VecNormalize` wrapper of the training env if it exists.
`learn`(total_timesteps[, callback, …])	Return a trained model.
`load`(path[, env, device, custom_objects, …])	Load the model from a zip-file.
`predict`(observation[, state, episode_start, …])	Get the policy action from an observation (and optional hidden state).
`save`(path[, exclude, include])	Save all the attributes of the object and the model parameters in a zip-file.
`set_env`(env[, force_reset])	Checks the validity of the environment, and if it is coherent, set it as the current environment.
`set_logger`(logger)	Setter for for logger object.
`set_parameters`(load_path_or_dict[, …])	Load parameters from a given zip-file or a nested dictionary containing parameters for different modules (see `get_parameters`).
`set_random_seed`([seed])	Set the seed of the pseudo-random generators (python, numpy, pytorch, gym, action_space)
`train`()	Update policy using the currently gathered rollout buffer.

Attributes


`logger`	Getter for the logger object.
`policy_aliases`
`rollout_buffer`
`policy`
`observation_space`
`action_space`
`n_envs`
`lr_schedule`

Parameters: : - policy (ActorCriticPolicy)

env (Env | VecEnv | str)
learning_rate (float | Callable[**[float]**, float])
n_steps (int)
batch_size (int)
n_epochs (int)
gamma (float)
gae_lambda (float)
clip_range (float | Callable[**[float]**, float])
clip_range_vf (None | float | Callable[**[float]**, float])
normalize_advantage (bool)
ent_coef (float)
vf_coef (float)
max_grad_norm (float)
use_sde (bool)
sde_sample_freq (int)
target_kl (float | None)
stats_window_size (int)
tensorboard_log (str | None)
policy_kwargs (Dict[str, Any] | None)
verbose (int)
seed (int | None)
device (device | str)
_init_setup_model (bool)

__init__(policy, env, learning_rate=0.0003, n_steps=2048, batch_size=64, n_epochs=10, gamma=0.99, gae_lambda=0.95, clip_range=0.2, clip_range_vf=None, normalize_advantage=True, ent_coef=0.0, vf_coef=0.5, max_grad_norm=0.5, use_sde=False, sde_sample_freq=-1, target_kl=None, stats_window_size=100, tensorboard_log=None, policy_kwargs=None, verbose=0, seed=None, device=‘auto’, _init_setup_model=True) : Parameters: : - policy (str | Type[ActorCriticPolicy])

env (Env | VecEnv | str)
learning_rate (float | Callable[**[float]**, float])
n_steps (int)
batch_size (int)
n_epochs (int)
gamma (float)
gae_lambda (float)
clip_range (float | Callable[**[float]**, float])
clip_range_vf (None | float | Callable[**[float]**, float])
normalize_advantage (bool)
ent_coef (float)
vf_coef (float)
max_grad_norm (float)
use_sde (bool)
sde_sample_freq (int)
target_kl (float | None)
stats_window_size (int)
tensorboard_log (str | None)
policy_kwargs (Dict[str, Any] | None)
verbose (int)
seed (int | None)
device (device | str)
_init_setup_model (bool)