schola.sb3.action_space_patch.HybridDistribution
Class Definition
class schola.sb3.action_space_patch.HybridDistribution( distributions, discrete_norm_factor=1.0, continuous_norm_factor=1.0)
Bases: DiagGaussianDistribution
A composite distribution supporting discrete and continuous sub-distributions.
Parameters
distributions
Type: OrderedDict[str, Distribution]
A dictionary of distributions to use for the composite distribution.
discrete_norm_factor
Type: float
, optional
Default: 1.0
The normalization factor for discrete actions.
continuous_norm_factor
Type: float
, optional
Default: 1.0
The normalization factor for continuous actions.
Attributes
distributions
Type: OrderedDict[str, Distribution]
A dictionary of distributions to use for the composite distribution.
Properties
action_dim
Type: int
(property)
The size of the action tensor corresponding to this distribution.
Returns: The size of the action tensor corresponding to this distribution.
action_dims
Type: Dict[str, int]
(property)
The size of the action tensor corresponding to each branch of the distribution.
Returns: A dictionary mapping branch of the distribution to the size of the action tensor corresponding to that branch.
layer_dim
Type: int
(property)
The neurons required for this distribution.
Returns: The number of neurons required for this distribution.
layer_dims
Type: Dict[str, int]
(property)
The number of neurons required for each branch of the distribution.
Returns: A dictionary mapping branch of the distribution to the number of neurons required.
log_std_dim
Type: int
(property)
The number of neurons required for the log standard deviation.
Returns: The number of neurons required for the log standard deviation.
log_std_dims
Type: Dict[str, int]
(property)
The number of neurons required for the log standard deviation of each branch.
Returns: A dictionary mapping branch of the distribution to the number of neurons required for the log standard deviation.
Methods
__init__
__init__(distributions, discrete_norm_factor=1.0, continuous_norm_factor=1.0)
Parameters:
- distributions (
OrderedDict
) – A dictionary of distributions to use for the composite distribution - discrete_norm_factor (
float
, optional) – The normalization factor for discrete actions, by default 1.0 - continuous_norm_factor (
float
, optional) – The normalization factor for continuous actions, by default 1.0
action_generator
action_generator(action)
Takes an Action Sampled from this distribution and generates the actions corresponding to each branch of the distribution (e.g. if we have 2 box spaces, it generates a sequence of 2 values sampled from those distributions).
Parameters:
- action (
th.Tensor
) – The action to generate the sub-actions from
Yields: th.Tensor
– The sub-action corresponding to a branch of the distribution
Return type: Iterable[Tensor]
actions_from_params
actions_from_params(action_logits, log_std, deterministic=False)
Returns samples from the probability distribution given its parameters.
Parameters:
- action_logits (
Tensor
) – The action logits - log_std (
Tensor
) – The log standard deviation - deterministic (
bool
, optional) – Whether to return deterministic actions, by default False
Returns: Tensor
– The sampled actions
entropy
entropy()
Returns Shannon’s entropy of the probability.
Returns: Tensor
– The entropy, or None if no analytical form is known
get_actions
get_actions(deterministic=False)
Return actions according to the probability distribution.
Parameters:
- deterministic (
bool
, optional) – Whether to return deterministic actions, by default False
Returns: The sampled actions
log_prob
log_prob(actions)
Get the log probabilities of actions according to the distribution.
Parameters:
- actions (
Tensor
) – The actions to evaluate
Returns: Tensor
– The log probabilities
log_prob_from_params
log_prob_from_params(mean_actions, log_std)
Compute the log probability of taking an action given the distribution parameters.
Parameters:
- mean_actions (
Tensor
) – The mean actions - log_std (
Tensor
) – The log standard deviation
Returns: Tuple[Tensor, Tensor]
– The log probabilities and entropy
map_dists
map_dists(func, normalize=False)
Maps a function over the distributions in the composite distribution.
Parameters:
- func (
Callable[[Distribution], Any]
) – The function to map over the distributions - normalize (
bool
, optional) – Whether to normalize the output of the function using the norm factors, by default False
mode
mode()
Returns the most likely action (deterministic output) from the probability distribution.
Returns: Tensor
– The stochastic action
proba_distribution
proba_distribution(mean_actions, log_std)
Create the distribution given its parameters (mean, std).
Parameters:
- mean_actions (
Tensor
) – The mean actions - log_std (
Tensor
) – The log standard deviation
proba_distribution_net
proba_distribution_net(latent_dim, log_std_init=0.0)
Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values).
Parameters:
- latent_dim (
int
) – Dimension of the last layer of the policy (before the action layer) - log_std_init (
float
, optional) – Initial value for the log standard deviation, by default 0.0
sample
sample()
Returns a sample from the probability distribution.
Returns: Tensor
– The stochastic action