schola.sb3.action_space_patch.HybridDistribution

Class Definition

class schola.sb3.action_space_patch.HybridDistribution(
    distributions,
    discrete_norm_factor=1.0,
    continuous_norm_factor=1.0
)

Bases: DiagGaussianDistribution

A composite distribution supporting discrete and continuous sub-distributions.

Parameters

distributions

Type: OrderedDict[str, Distribution]
A dictionary of distributions to use for the composite distribution.

discrete_norm_factor

Type: float, optional
Default: 1.0
The normalization factor for discrete actions.

continuous_norm_factor

Type: float, optional
Default: 1.0
The normalization factor for continuous actions.

Attributes

distributions

Type: OrderedDict[str, Distribution]
A dictionary of distributions to use for the composite distribution.

Properties

action_dim

Type: int (property)
The size of the action tensor corresponding to this distribution.

Returns: The size of the action tensor corresponding to this distribution.

action_dims

Type: Dict[str, int] (property)
The size of the action tensor corresponding to each branch of the distribution.

Returns: A dictionary mapping branch of the distribution to the size of the action tensor corresponding to that branch.

layer_dim

Type: int (property)
The neurons required for this distribution.

Returns: The number of neurons required for this distribution.

layer_dims

Type: Dict[str, int] (property)
The number of neurons required for each branch of the distribution.

Returns: A dictionary mapping branch of the distribution to the number of neurons required.

log_std_dim

Type: int (property)
The number of neurons required for the log standard deviation.

Returns: The number of neurons required for the log standard deviation.

log_std_dims

Type: Dict[str, int] (property)
The number of neurons required for the log standard deviation of each branch.

Returns: A dictionary mapping branch of the distribution to the number of neurons required for the log standard deviation.

Methods

init

__init__(distributions, discrete_norm_factor=1.0, continuous_norm_factor=1.0)

Parameters:

distributions (OrderedDict) – A dictionary of distributions to use for the composite distribution
discrete_norm_factor (float, optional) – The normalization factor for discrete actions, by default 1.0
continuous_norm_factor (float, optional) – The normalization factor for continuous actions, by default 1.0

action_generator

action_generator(action)

Takes an Action Sampled from this distribution and generates the actions corresponding to each branch of the distribution (e.g. if we have 2 box spaces, it generates a sequence of 2 values sampled from those distributions).

Parameters:

action (th.Tensor) – The action to generate the sub-actions from

Yields: th.Tensor – The sub-action corresponding to a branch of the distribution

Return type: Iterable[Tensor]

actions_from_params

actions_from_params(action_logits, log_std, deterministic=False)

Returns samples from the probability distribution given its parameters.

Parameters:

action_logits (Tensor) – The action logits
log_std (Tensor) – The log standard deviation
deterministic (bool, optional) – Whether to return deterministic actions, by default False

Returns: Tensor – The sampled actions

entropy

entropy()

Returns Shannon’s entropy of the probability.

Returns: Tensor – The entropy, or None if no analytical form is known

get_actions

get_actions(deterministic=False)

Return actions according to the probability distribution.

Parameters:

deterministic (bool, optional) – Whether to return deterministic actions, by default False

Returns: The sampled actions

log_prob

log_prob(actions)

Get the log probabilities of actions according to the distribution.

Parameters:

actions (Tensor) – The actions to evaluate

Returns: Tensor – The log probabilities

log_prob_from_params

log_prob_from_params(mean_actions, log_std)

Compute the log probability of taking an action given the distribution parameters.

Parameters:

mean_actions (Tensor) – The mean actions
log_std (Tensor) – The log standard deviation

Returns: Tuple[Tensor, Tensor] – The log probabilities and entropy

map_dists

map_dists(func, normalize=False)

Maps a function over the distributions in the composite distribution.

Parameters:

func (Callable[[Distribution], Any]) – The function to map over the distributions
normalize (bool, optional) – Whether to normalize the output of the function using the norm factors, by default False

mode

mode()

Returns the most likely action (deterministic output) from the probability distribution.

Returns: Tensor – The stochastic action

proba_distribution

proba_distribution(mean_actions, log_std)

Create the distribution given its parameters (mean, std).

Parameters:

mean_actions (Tensor) – The mean actions
log_std (Tensor) – The log standard deviation

proba_distribution_net

proba_distribution_net(latent_dim, log_std_init=0.0)

Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values).

Parameters:

latent_dim (int) – Dimension of the last layer of the policy (before the action layer)
log_std_init (float, optional) – Initial value for the log standard deviation, by default 0.0

sample

sample()

Returns a sample from the probability distribution.

Returns: Tensor – The stochastic action