schola.sb3.action_space_patch.HybridDistribution
- class schola.sb3.action_space_patch.HybridDistribution(distributions, discrete_norm_factor=1.0, continuous_norm_factor=1.0)[source]
-
Bases:
DiagGaussianDistribution
A composite distribution supporting discrete and continuous sub-distributions.
- Parameters:
-
-
distributions (OrderedDict[str,Distribution]) – A dictionary of distributions to use for the composite distribution.
-
discrete_norm_factor (float, default=1.0) – The normalization factor for discrete actions, by default 1.0
-
continuous_norm_factor (float, default=1.0) – The normalization factor for continuous actions, by default 1.0
-
- distributions
-
A dictionary of distributions to use for the composite distribution.
- Type:
-
OrderedDict[str,Distribution]
Methods
__init__
(distributions[, …])action_generator
(action)Takes an Action Sampled from this distribution and generates the actions corresponding to each branch of the distribution (e.g. if we have 2 box spaces, it generates a sequence of 2 values sampled from those distributions).
actions_from_params
(action_logits, log_std)Returns samples from the probability distribution given its parameters.
entropy
()Returns Shannon’s entropy of the probability
get_actions
([deterministic])Return actions according to the probability distribution.
log_prob
(actions)Get the log probabilities of actions according to the distribution.
log_prob_from_params
(mean_actions, log_std)Compute the log probability of taking an action given the distribution parameters.
map_dists
(func[, normalize])Maps a function over the distributions in the composite distribution.
mode
()Returns the most likely action (deterministic output) from the probability distribution
proba_distribution
(mean_actions, log_std)Create the distribution given its parameters (mean, std)
proba_distribution_net
(latent_dim[, …])Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values)
sample
()Returns a sample from the probability distribution
Attributes
The size of the action tensor corresponding to this distribution.
The size of the action tensor corresponding to each branch of the distribution.
The neurons required for this distribution.
The number of neurons required for each branch of the distribution.
The number of neurons required for the log standard deviation.
The number of neurons required for the log standard deviation of each branch.
- __init__(distributions, discrete_norm_factor=1.0, continuous_norm_factor=1.0)[source]
-
- Parameters:
-
distributions (OrderedDict)
- property action_dim: int
-
The size of the action tensor corresponding to this distribution.
- Returns:
-
The size of the action tensor corresponding to this distribution.
- Return type:
- property action_dims: Dict[str, int]
-
The size of the action tensor corresponding to each branch of the distribution.
- action_generator(action)[source]
-
Takes an Action Sampled from this distribution and generates the actions corresponding to each branch of the distribution (e.g. if we have 2 box spaces, it generates a sequence of 2 values sampled from those distributions)
- Parameters:
-
action (th.Tensor) – The action to generate the sub-actions from.
- Yields:
-
th.Tensor – The sub-action corresponding to a branch of the distribution.
- Return type:
-
Iterable[Tensor]
- actions_from_params(action_logits, log_std, deterministic=False)[source]
-
Returns samples from the probability distribution given its parameters.
- Returns:
-
actions
- Parameters:
-
-
action_logits (Tensor)
-
log_std (Tensor)
-
deterministic (bool)
-
- Return type:
-
Tensor
- entropy()[source]
-
Returns Shannon’s entropy of the probability
- Returns:
-
the entropy, or None if no analytical form is known
- Return type:
-
Tensor
- property layer_dim: int
-
The neurons required for this distribution.
- Returns:
-
The number of neurons required for this distribution
- Return type:
- property layer_dims: Dict[str, int]
-
The number of neurons required for each branch of the distribution.
- log_prob(actions)[source]
-
Get the log probabilities of actions according to the distribution. Note that you must first call the
proba_distribution()
method.- Parameters:
-
actions
- Returns:
- Return type:
-
Tensor
- log_prob_from_params(mean_actions, log_std)[source]
-
Compute the log probability of taking an action given the distribution parameters.
- Parameters:
-
-
mean_actions (Tensor)
-
log_std (Tensor)
-
- Returns:
- Return type:
-
Tuple[Tensor, Tensor]
- property log_std_dim: int
-
The number of neurons required for the log standard deviation.
- Returns:
-
The number of neurons required for the log standard deviation.
- Return type:
- property log_std_dims: Dict[str, int]
-
The number of neurons required for the log standard deviation of each branch.
- map_dists(func, normalize=False)[source]
-
Maps a function over the distributions in the composite distribution.
- Parameters:
-
-
func (Callable[[Distribution], Any]) – The function to map over the distributions.
-
normalize (bool, optional) – Whether to normalize the output of the function using the norm factors, by default False
-
- mode()[source]
-
Returns the most likely action (deterministic output) from the probability distribution
- Returns:
-
the stochastic action
- proba_distribution(mean_actions, log_std)[source]
-
Create the distribution given its parameters (mean, std)
- Parameters:
-
-
mean_actions (Tensor)
-
log_std (Tensor)
-
- Returns:
- proba_distribution_net(latent_dim, log_std_init=0.0)[source]
-
Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values)
- Parameters:
-
-
latent_dim – Dimension of the last layer of the policy (before the action layer)
-
log_std_init (float) – Initial value for the log standard deviation
-
- Returns:
- sample()[source]
-
Returns a sample from the probability distribution
- Returns:
-
the stochastic action
- Return type:
-
Tensor