schola.sb3.action_space_patch.HybridDistribution

class schola.sb3.action_space_patch.HybridDistribution(distributions, discrete_norm_factor=1.0, continuous_norm_factor=1.0) : Bases: DiagGaussianDistribution

A composite distribution supporting discrete and continuous sub-distributions.

Parameters: : - distributions (OrderedDict*[str,Distribution]*) – A dictionary of distributions to use for the composite distribution.

discrete_norm_factor (float, default=1.0) – The normalization factor for discrete actions, by default 1.0
continuous_norm_factor (float, default=1.0) – The normalization factor for continuous actions, by default 1.0

distributions : A dictionary of distributions to use for the composite distribution.

Type: : OrderedDict[str,Distribution]

Methods


`__init__`(distributions[, …])
`action_generator`(action)	Takes an Action Sampled from this distribution and generates the actions corresponding to each branch of the distribution (e.g. if we have 2 box spaces, it generates a sequence of 2 values sampled from those distributions).
`actions_from_params`(action_logits, log_std)	Returns samples from the probability distribution given its parameters.
`entropy`()	Returns Shannon’s entropy of the probability
`get_actions`([deterministic])	Return actions according to the probability distribution.
`log_prob`(actions)	Get the log probabilities of actions according to the distribution.
`log_prob_from_params`(mean_actions, log_std)	Compute the log probability of taking an action given the distribution parameters.
`map_dists`(func[, normalize])	Maps a function over the distributions in the composite distribution.
`mode`()	Returns the most likely action (deterministic output) from the probability distribution
`proba_distribution`(mean_actions, log_std)	Create the distribution given its parameters (mean, std)
`proba_distribution_net`(latent_dim[, …])	Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values)
`sample`()	Returns a sample from the probability distribution

Attributes


`action_dim`	The size of the action tensor corresponding to this distribution.
`action_dims`	The size of the action tensor corresponding to each branch of the distribution.
`layer_dim`	The neurons required for this distribution.
`layer_dims`	The number of neurons required for each branch of the distribution.
`log_std_dim`	The number of neurons required for the log standard deviation.
`log_std_dims`	The number of neurons required for the log standard deviation of each branch.

__init__(distributions, discrete_norm_factor=1.0, continuous_norm_factor=1.0) : Parameters: : distributions (OrderedDict)

property action_dim*: int* : The size of the action tensor corresponding to this distribution.

Returns: : The size of the action tensor corresponding to this distribution.

Return type: : int

property action_dims*: Dict[str, int]* : The size of the action tensor corresponding to each branch of the distribution.

Returns: : A dictionary mapping branch of the distribution to the size of the action tensor corresponding to that branch.

Return type: : Dict[str,int]

action_generator(action) : Takes an Action Sampled from this distribution and generates the actions corresponding to each branch of the distribution (e.g. if we have 2 box spaces, it generates a sequence of 2 values sampled from those distributions)

Parameters: : action (th.Tensor) – The action to generate the sub-actions from.

Yields: : th.Tensor – The sub-action corresponding to a branch of the distribution.

Return type: : Iterable[Tensor]

actions_from_params(action_logits, log_std, deterministic=False) : Returns samples from the probability distribution given its parameters.

Returns: : actions

Parameters: : - action_logits (Tensor)

log_std (Tensor)
deterministic (bool)

Return type: : Tensor

entropy() : Returns Shannon’s entropy of the probability

Returns: : the entropy, or None if no analytical form is known

Return type: : Tensor

property layer_dim*: int* : The neurons required for this distribution.

Returns: : The number of neurons required for this distribution

Return type: : int

property layer_dims*: Dict[str, int]* : The number of neurons required for each branch of the distribution.

Returns: : A dictionary mapping branch of the distribution to the number of neurons required.

Return type: : Dict[str,int]

log_prob(actions) : Get the log probabilities of actions according to the distribution. Note that you must first call the proba_distribution() method.

Parameters: : actions

Returns:

Return type: : Tensor

log_prob_from_params(mean_actions, log_std) : Compute the log probability of taking an action given the distribution parameters.

Parameters: : - mean_actions (Tensor)

log_std (Tensor)

Returns:

Return type: : Tuple[Tensor, Tensor]

property log_std_dim*: int* : The number of neurons required for the log standard deviation.

Returns: : The number of neurons required for the log standard deviation.

Return type: : int

property log_std_dims*: Dict[str, int]* : The number of neurons required for the log standard deviation of each branch.

Returns: : A dictionary mapping branch of the distribution to the number of neurons required for the log standard deviation.

Return type: : Dict[str,int]

map_dists(func, normalize=False) : Maps a function over the distributions in the composite distribution.

Parameters: : - func (Callable*[[Distribution],* Any*]*) – The function to map over the distributions.

normalize (bool, optional) – Whether to normalize the output of the function using the norm factors, by default False

mode() : Returns the most likely action (deterministic output) from the probability distribution

Returns: : the stochastic action

proba_distribution(mean_actions, log_std) : Create the distribution given its parameters (mean, std)

Parameters: : - mean_actions (Tensor)

log_std (Tensor)

Returns:

proba_distribution_net(latent_dim, log_std_init=0.0) : Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values)

Parameters: : - latent_dim – Dimension of the last layer of the policy (before the action layer)

log_std_init (float) – Initial value for the log standard deviation

Returns:

sample() : Returns a sample from the probability distribution

Returns: : the stochastic action

Return type: : Tensor