schola.sb3.action_space_patch.PatchedPPO
- class schola.sb3.action_space_patch.PatchedPPO(policy, env, learning_rate=0.0003, n_steps=2048, batch_size=64, n_epochs=10, gamma=0.99, gae_lambda=0.95, clip_range=0.2, clip_range_vf=None, normalize_advantage=True, ent_coef=0.0, vf_coef=0.5, max_grad_norm=0.5, use_sde=False, sde_sample_freq=-1, target_kl=None, stats_window_size=100, tensorboard_log=None, policy_kwargs=None, verbose=0, seed=None, device=‘auto’, _init_setup_model=True)[source]
-
Bases:
PPO
Methods
__init__
(policy, env[, learning_rate, …])collect_rollouts
(env, callback, …)Collect experiences using the current policy and fill a
RolloutBuffer
.get_env
()Returns the current environment (can be None if not defined).
get_parameters
()Return the parameters of the agent.
get_vec_normalize_env
()Return the
VecNormalize
wrapper of the training env if it exists.learn
(total_timesteps[, callback, …])Return a trained model.
load
(path[, env, device, custom_objects, …])Load the model from a zip-file.
predict
(observation[, state, episode_start, …])Get the policy action from an observation (and optional hidden state).
save
(path[, exclude, include])Save all the attributes of the object and the model parameters in a zip-file.
set_env
(env[, force_reset])Checks the validity of the environment, and if it is coherent, set it as the current environment.
set_logger
(logger)Setter for for logger object.
set_parameters
(load_path_or_dict[, …])Load parameters from a given zip-file or a nested dictionary containing parameters for different modules (see
get_parameters
).set_random_seed
([seed])Set the seed of the pseudo-random generators (python, numpy, pytorch, gym, action_space)
train
()Update policy using the currently gathered rollout buffer.
Attributes
logger
Getter for the logger object.
policy_aliases
rollout_buffer
policy
observation_space
action_space
n_envs
lr_schedule
- Parameters:
-
-
policy (ActorCriticPolicy)
-
n_steps (int)
-
batch_size (int)
-
n_epochs (int)
-
gamma (float)
-
gae_lambda (float)
-
normalize_advantage (bool)
-
ent_coef (float)
-
vf_coef (float)
-
max_grad_norm (float)
-
use_sde (bool)
-
sde_sample_freq (int)
-
target_kl (float | None)
-
stats_window_size (int)
-
tensorboard_log (str | None)
-
verbose (int)
-
seed (int | None)
-
device (device | str)
-
_init_setup_model (bool)
-
- __init__(policy, env, learning_rate=0.0003, n_steps=2048, batch_size=64, n_epochs=10, gamma=0.99, gae_lambda=0.95, clip_range=0.2, clip_range_vf=None, normalize_advantage=True, ent_coef=0.0, vf_coef=0.5, max_grad_norm=0.5, use_sde=False, sde_sample_freq=-1, target_kl=None, stats_window_size=100, tensorboard_log=None, policy_kwargs=None, verbose=0, seed=None, device=‘auto’, _init_setup_model=True)[source]
-
- Parameters:
-
-
n_steps (int)
-
batch_size (int)
-
n_epochs (int)
-
gamma (float)
-
gae_lambda (float)
-
normalize_advantage (bool)
-
ent_coef (float)
-
vf_coef (float)
-
max_grad_norm (float)
-
use_sde (bool)
-
sde_sample_freq (int)
-
target_kl (float | None)
-
stats_window_size (int)
-
tensorboard_log (str | None)
-
verbose (int)
-
seed (int | None)
-
device (device | str)
-
_init_setup_model (bool)