Skip to content

schola.sb3.action_space_patch.HybridDistribution

class schola.sb3.action_space_patch.HybridDistribution(distributions, discrete_norm_factor=1.0, continuous_norm_factor=1.0) : Bases: DiagGaussianDistribution

A composite distribution supporting discrete and continuous sub-distributions.

Parameters: : - distributions (OrderedDict*[str,Distribution]*) – A dictionary of distributions to use for the composite distribution.

  • discrete_norm_factor (float, default=1.0) – The normalization factor for discrete actions, by default 1.0
  • continuous_norm_factor (float, default=1.0) – The normalization factor for continuous actions, by default 1.0

distributions : A dictionary of distributions to use for the composite distribution.

Type: : OrderedDict[str,Distribution]

Methods

__init__(distributions[, …])
action_generator(action)Takes an Action Sampled from this distribution and generates the actions corresponding to each branch of the distribution (e.g. if we have 2 box spaces, it generates a sequence of 2 values sampled from those distributions).
actions_from_params(action_logits, log_std)Returns samples from the probability distribution given its parameters.
entropy()Returns Shannon’s entropy of the probability
get_actions([deterministic])Return actions according to the probability distribution.
log_prob(actions)Get the log probabilities of actions according to the distribution.
log_prob_from_params(mean_actions, log_std)Compute the log probability of taking an action given the distribution parameters.
map_dists(func[, normalize])Maps a function over the distributions in the composite distribution.
mode()Returns the most likely action (deterministic output) from the probability distribution
proba_distribution(mean_actions, log_std)Create the distribution given its parameters (mean, std)
proba_distribution_net(latent_dim[, …])Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values)
sample()Returns a sample from the probability distribution

Attributes

action_dimThe size of the action tensor corresponding to this distribution.
action_dimsThe size of the action tensor corresponding to each branch of the distribution.
layer_dimThe neurons required for this distribution.
layer_dimsThe number of neurons required for each branch of the distribution.
log_std_dimThe number of neurons required for the log standard deviation.
log_std_dimsThe number of neurons required for the log standard deviation of each branch.

__init__(distributions, discrete_norm_factor=1.0, continuous_norm_factor=1.0) : Parameters: : distributions (OrderedDict)

property action_dim*: int* : The size of the action tensor corresponding to this distribution.

Returns: : The size of the action tensor corresponding to this distribution.

Return type: : int

property action_dims*: Dict[str, int]* : The size of the action tensor corresponding to each branch of the distribution.

Returns: : A dictionary mapping branch of the distribution to the size of the action tensor corresponding to that branch.

Return type: : Dict[str,int]

action_generator(action) : Takes an Action Sampled from this distribution and generates the actions corresponding to each branch of the distribution (e.g. if we have 2 box spaces, it generates a sequence of 2 values sampled from those distributions)

Parameters: : action (th.Tensor) – The action to generate the sub-actions from.

Yields: : th.Tensor – The sub-action corresponding to a branch of the distribution.

Return type: : Iterable[Tensor]

actions_from_params(action_logits, log_std, deterministic=False) : Returns samples from the probability distribution given its parameters.

Returns: : actions

Parameters: : - action_logits (Tensor)

  • log_std (Tensor)
  • deterministic (bool)

Return type: : Tensor

entropy() : Returns Shannon’s entropy of the probability

Returns: : the entropy, or None if no analytical form is known

Return type: : Tensor

property layer_dim*: int* : The neurons required for this distribution.

Returns: : The number of neurons required for this distribution

Return type: : int

property layer_dims*: Dict[str, int]* : The number of neurons required for each branch of the distribution.

Returns: : A dictionary mapping branch of the distribution to the number of neurons required.

Return type: : Dict[str,int]

log_prob(actions) : Get the log probabilities of actions according to the distribution. Note that you must first call the proba_distribution() method.

Parameters: : actions

Returns:

Return type: : Tensor

log_prob_from_params(mean_actions, log_std) : Compute the log probability of taking an action given the distribution parameters.

Parameters: : - mean_actions (Tensor)

  • log_std (Tensor)

Returns:

Return type: : Tuple[Tensor, Tensor]

property log_std_dim*: int* : The number of neurons required for the log standard deviation.

Returns: : The number of neurons required for the log standard deviation.

Return type: : int

property log_std_dims*: Dict[str, int]* : The number of neurons required for the log standard deviation of each branch.

Returns: : A dictionary mapping branch of the distribution to the number of neurons required for the log standard deviation.

Return type: : Dict[str,int]

map_dists(func, normalize=False) : Maps a function over the distributions in the composite distribution.

Parameters: : - func (Callable*[[Distribution],* Any*]*) – The function to map over the distributions.

  • normalize (bool, optional) – Whether to normalize the output of the function using the norm factors, by default False

mode() : Returns the most likely action (deterministic output) from the probability distribution

Returns: : the stochastic action

proba_distribution(mean_actions, log_std) : Create the distribution given its parameters (mean, std)

Parameters: : - mean_actions (Tensor)

  • log_std (Tensor)

Returns:

proba_distribution_net(latent_dim, log_std_init=0.0) : Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values)

Parameters: : - latent_dim – Dimension of the last layer of the policy (before the action layer)

  • log_std_init (float) – Initial value for the log standard deviation

Returns:

sample() : Returns a sample from the probability distribution

Returns: : the stochastic action

Return type: : Tensor