schola.sb3.action_space_patch.PatchedPPO
class schola.sb3.action_space_patch.PatchedPPO(policy, env, learning_rate=0.0003, n_steps=2048, batch_size=64, n_epochs=10, gamma=0.99, gae_lambda=0.95, clip_range=0.2, clip_range_vf=None, normalize_advantage=True, ent_coef=0.0, vf_coef=0.5, max_grad_norm=0.5, use_sde=False, sde_sample_freq=-1, target_kl=None, stats_window_size=100, tensorboard_log=None, policy_kwargs=None, verbose=0, seed=None, device=‘auto’, _init_setup_model=True)
: Bases: PPO
Methods
__init__ (policy, env[, learning_rate, …]) | |
collect_rollouts (env, callback, …) | Collect experiences using the current policy and fill a RolloutBuffer . |
get_env () | Returns the current environment (can be None if not defined). |
get_parameters () | Return the parameters of the agent. |
get_vec_normalize_env () | Return the VecNormalize wrapper of the training env if it exists. |
learn (total_timesteps[, callback, …]) | Return a trained model. |
load (path[, env, device, custom_objects, …]) | Load the model from a zip-file. |
predict (observation[, state, episode_start, …]) | Get the policy action from an observation (and optional hidden state). |
save (path[, exclude, include]) | Save all the attributes of the object and the model parameters in a zip-file. |
set_env (env[, force_reset]) | Checks the validity of the environment, and if it is coherent, set it as the current environment. |
set_logger (logger) | Setter for for logger object. |
set_parameters (load_path_or_dict[, …]) | Load parameters from a given zip-file or a nested dictionary containing parameters for different modules (see get_parameters ). |
set_random_seed ([seed]) | Set the seed of the pseudo-random generators (python, numpy, pytorch, gym, action_space) |
train () | Update policy using the currently gathered rollout buffer. |
Attributes
logger | Getter for the logger object. |
policy_aliases | |
rollout_buffer | |
policy | |
observation_space | |
action_space | |
n_envs | |
lr_schedule |
Parameters: : - policy (ActorCriticPolicy)
- env (Env | VecEnv | str)
- learning_rate (float | Callable[**[float]**, float])
- n_steps (int)
- batch_size (int)
- n_epochs (int)
- gamma (float)
- gae_lambda (float)
- clip_range (float | Callable[**[float]**, float])
- clip_range_vf (None | float | Callable[**[float]**, float])
- normalize_advantage (bool)
- ent_coef (float)
- vf_coef (float)
- max_grad_norm (float)
- use_sde (bool)
- sde_sample_freq (int)
- target_kl (float | None)
- stats_window_size (int)
- tensorboard_log (str | None)
- policy_kwargs (Dict[str, Any] | None)
- verbose (int)
- seed (int | None)
- device (device | str)
- _init_setup_model (bool)
__init__(policy, env, learning_rate=0.0003, n_steps=2048, batch_size=64, n_epochs=10, gamma=0.99, gae_lambda=0.95, clip_range=0.2, clip_range_vf=None, normalize_advantage=True, ent_coef=0.0, vf_coef=0.5, max_grad_norm=0.5, use_sde=False, sde_sample_freq=-1, target_kl=None, stats_window_size=100, tensorboard_log=None, policy_kwargs=None, verbose=0, seed=None, device=‘auto’, _init_setup_model=True) : Parameters: : - policy (str | Type[ActorCriticPolicy])
- env (Env | VecEnv | str)
- learning_rate (float | Callable[**[float]**, float])
- n_steps (int)
- batch_size (int)
- n_epochs (int)
- gamma (float)
- gae_lambda (float)
- clip_range (float | Callable[**[float]**, float])
- clip_range_vf (None | float | Callable[**[float]**, float])
- normalize_advantage (bool)
- ent_coef (float)
- vf_coef (float)
- max_grad_norm (float)
- use_sde (bool)
- sde_sample_freq (int)
- target_kl (float | None)
- stats_window_size (int)
- tensorboard_log (str | None)
- policy_kwargs (Dict[str, Any] | None)
- verbose (int)
- seed (int | None)
- device (device | str)
- _init_setup_model (bool)