Skip to content

schola.sb3.action_space_patch.PatchedPPO

class schola.sb3.action_space_patch.PatchedPPO(policy, env, learning_rate=0.0003, n_steps=2048, batch_size=64, n_epochs=10, gamma=0.99, gae_lambda=0.95, clip_range=0.2, clip_range_vf=None, normalize_advantage=True, ent_coef=0.0, vf_coef=0.5, max_grad_norm=0.5, use_sde=False, sde_sample_freq=-1, target_kl=None, stats_window_size=100, tensorboard_log=None, policy_kwargs=None, verbose=0, seed=None, device=‘auto’, _init_setup_model=True) : Bases: PPO

Methods

__init__(policy, env[, learning_rate, …])
collect_rollouts(env, callback, …)Collect experiences using the current policy and fill a RolloutBuffer.
get_env()Returns the current environment (can be None if not defined).
get_parameters()Return the parameters of the agent.
get_vec_normalize_env()Return the VecNormalize wrapper of the training env if it exists.
learn(total_timesteps[, callback, …])Return a trained model.
load(path[, env, device, custom_objects, …])Load the model from a zip-file.
predict(observation[, state, episode_start, …])Get the policy action from an observation (and optional hidden state).
save(path[, exclude, include])Save all the attributes of the object and the model parameters in a zip-file.
set_env(env[, force_reset])Checks the validity of the environment, and if it is coherent, set it as the current environment.
set_logger(logger)Setter for for logger object.
set_parameters(load_path_or_dict[, …])Load parameters from a given zip-file or a nested dictionary containing parameters for different modules (see get_parameters).
set_random_seed([seed])Set the seed of the pseudo-random generators (python, numpy, pytorch, gym, action_space)
train()Update policy using the currently gathered rollout buffer.

Attributes

loggerGetter for the logger object.
policy_aliases
rollout_buffer
policy
observation_space
action_space
n_envs
lr_schedule

Parameters: : - policy (ActorCriticPolicy)

__init__(policy, env, learning_rate=0.0003, n_steps=2048, batch_size=64, n_epochs=10, gamma=0.99, gae_lambda=0.95, clip_range=0.2, clip_range_vf=None, normalize_advantage=True, ent_coef=0.0, vf_coef=0.5, max_grad_norm=0.5, use_sde=False, sde_sample_freq=-1, target_kl=None, stats_window_size=100, tensorboard_log=None, policy_kwargs=None, verbose=0, seed=None, device=‘auto’, _init_setup_model=True) : Parameters: : - policy (str | Type[ActorCriticPolicy])