schola.sb3.action_space_patch.HybridDistribution
class schola.sb3.action_space_patch.HybridDistribution(distributions, discrete_norm_factor=1.0, continuous_norm_factor=1.0)
: Bases: DiagGaussianDistribution
A composite distribution supporting discrete and continuous sub-distributions.
Parameters: : - distributions (OrderedDict*[str,Distribution]*) – A dictionary of distributions to use for the composite distribution.
- discrete_norm_factor (float, default=1.0) – The normalization factor for discrete actions, by default 1.0
- continuous_norm_factor (float, default=1.0) – The normalization factor for continuous actions, by default 1.0
distributions : A dictionary of distributions to use for the composite distribution.
Type: : OrderedDict[str,Distribution]
Methods
__init__ (distributions[, …]) | |
action_generator (action) | Takes an Action Sampled from this distribution and generates the actions corresponding to each branch of the distribution (e.g. if we have 2 box spaces, it generates a sequence of 2 values sampled from those distributions). |
actions_from_params (action_logits, log_std) | Returns samples from the probability distribution given its parameters. |
entropy () | Returns Shannon’s entropy of the probability |
get_actions ([deterministic]) | Return actions according to the probability distribution. |
log_prob (actions) | Get the log probabilities of actions according to the distribution. |
log_prob_from_params (mean_actions, log_std) | Compute the log probability of taking an action given the distribution parameters. |
map_dists (func[, normalize]) | Maps a function over the distributions in the composite distribution. |
mode () | Returns the most likely action (deterministic output) from the probability distribution |
proba_distribution (mean_actions, log_std) | Create the distribution given its parameters (mean, std) |
proba_distribution_net (latent_dim[, …]) | Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values) |
sample () | Returns a sample from the probability distribution |
Attributes
action_dim | The size of the action tensor corresponding to this distribution. |
action_dims | The size of the action tensor corresponding to each branch of the distribution. |
layer_dim | The neurons required for this distribution. |
layer_dims | The number of neurons required for each branch of the distribution. |
log_std_dim | The number of neurons required for the log standard deviation. |
log_std_dims | The number of neurons required for the log standard deviation of each branch. |
__init__(distributions, discrete_norm_factor=1.0, continuous_norm_factor=1.0) : Parameters: : distributions (OrderedDict)
property action_dim*: int* : The size of the action tensor corresponding to this distribution.
Returns: : The size of the action tensor corresponding to this distribution.
Return type: : int
property action_dims*: Dict[str, int]* : The size of the action tensor corresponding to each branch of the distribution.
Returns: : A dictionary mapping branch of the distribution to the size of the action tensor corresponding to that branch.
action_generator(action) : Takes an Action Sampled from this distribution and generates the actions corresponding to each branch of the distribution (e.g. if we have 2 box spaces, it generates a sequence of 2 values sampled from those distributions)
Parameters: : action (th.Tensor) – The action to generate the sub-actions from.
Yields: : th.Tensor – The sub-action corresponding to a branch of the distribution.
Return type: : Iterable[Tensor]
actions_from_params(action_logits, log_std, deterministic=False) : Returns samples from the probability distribution given its parameters.
Returns: : actions
Parameters: : - action_logits (Tensor)
- log_std (Tensor)
- deterministic (bool)
Return type: : Tensor
entropy() : Returns Shannon’s entropy of the probability
Returns: : the entropy, or None if no analytical form is known
Return type: : Tensor
property layer_dim*: int* : The neurons required for this distribution.
Returns: : The number of neurons required for this distribution
Return type: : int
property layer_dims*: Dict[str, int]* : The number of neurons required for each branch of the distribution.
Returns: : A dictionary mapping branch of the distribution to the number of neurons required.
log_prob(actions)
: Get the log probabilities of actions according to the distribution.
Note that you must first call the proba_distribution()
method.
Parameters: : actions
Returns:
Return type: : Tensor
log_prob_from_params(mean_actions, log_std) : Compute the log probability of taking an action given the distribution parameters.
Parameters: : - mean_actions (Tensor)
- log_std (Tensor)
Returns:
Return type: : Tuple[Tensor, Tensor]
property log_std_dim*: int* : The number of neurons required for the log standard deviation.
Returns: : The number of neurons required for the log standard deviation.
Return type: : int
property log_std_dims*: Dict[str, int]* : The number of neurons required for the log standard deviation of each branch.
Returns: : A dictionary mapping branch of the distribution to the number of neurons required for the log standard deviation.
map_dists(func, normalize=False) : Maps a function over the distributions in the composite distribution.
Parameters: : - func (Callable*[[Distribution],* Any*]*) – The function to map over the distributions.
- normalize (bool, optional) – Whether to normalize the output of the function using the norm factors, by default False
mode() : Returns the most likely action (deterministic output) from the probability distribution
Returns: : the stochastic action
proba_distribution(mean_actions, log_std) : Create the distribution given its parameters (mean, std)
Parameters: : - mean_actions (Tensor)
- log_std (Tensor)
Returns:
proba_distribution_net(latent_dim, log_std_init=0.0) : Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values)
Parameters: : - latent_dim – Dimension of the last layer of the policy (before the action layer)
- log_std_init (float) – Initial value for the log standard deviation
Returns:
sample() : Returns a sample from the probability distribution
Returns: : the stochastic action
Return type: : Tensor