malib.common package
Submodules
malib.common.distributions module
Probability distributions. Reference: https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/distributions.py
- class malib.common.distributions.BernoulliDistribution(action_dims: int)[source]
Bases:
Distribution
Bernoulli distribution for MultiBinary action spaces.
- Parameters:
action_dim – Number of binary actions
- actions_from_params(action_logits: Tensor, deterministic: bool = False) Tensor [source]
Returns samples from the probability distribution given its parameters.
- Returns:
actions
- entropy() Tensor [source]
Returns Shannon’s entropy of the probability
- Returns:
the entropy, or None if no analytical form is known
- log_prob(actions: Tensor) Tensor [source]
Returns the log likelihood
- Parameters:
x – the taken action
- Returns:
The log likelihood of the distribution
- log_prob_from_params(action_logits: Tensor) Tuple[Tensor, Tensor] [source]
Returns samples and the associated log probabilities from the probability distribution given its parameters.
- Returns:
actions and log prob
- mode() Tensor [source]
Returns the most likely action (deterministic output) from the probability distribution
- Returns:
the stochastic action
- proba_distribution(action_logits: Tensor) BernoulliDistribution [source]
Set parameters of the distribution.
- Returns:
self
- class malib.common.distributions.CategoricalDistribution(action_dim: int)[source]
Bases:
Distribution
Categorical distribution for discrete actions.
- Parameters:
action_dim (int) – Number of discrete actions.
- actions_from_params(action_logits: Tensor, deterministic: bool = False) Tensor [source]
Returns samples from the probability distribution given its parameters.
- Returns:
actions
- entropy() Tensor [source]
Returns Shannon’s entropy of the probability
- Returns:
the entropy, or None if no analytical form is known
- log_prob(actions: Tensor) Tensor [source]
Returns the log likelihood
- Parameters:
x – the taken action
- Returns:
The log likelihood of the distribution
- log_prob_from_params(action_logits: Tensor, deterministic: bool = False) Tuple[Tensor, Tensor] [source]
Returns samples and the associated log probabilities from the probability distribution given its parameters.
- Returns:
actions and log prob
- mode() Tensor [source]
Returns the most likely action (deterministic output) from the probability distribution
- Returns:
the stochastic action
- prob() Tensor [source]
Return a tensor which indicates the distribution
- Returns:
A distribution tensor
- Return type:
torch.Tensor
- proba_distribution(action_logits: Tensor, action_mask: Optional[Tensor] = None) CategoricalDistribution [source]
Set parameters of the distribution.
- Returns:
self
- proba_distribution_net(latent_dim: int) Module [source]
Create the layer that represents the distribution: it will be the logits of the Categorical distribution. You can then get probabilities using a softmax.
- Parameters:
latent_dim – Dimension of the last layer of the policy network (before the action layer)
- Returns:
- class malib.common.distributions.DiagGaussianDistribution(action_dim: int)[source]
Bases:
Distribution
Gaussian distribution with diagonal covariance matrix, for continuous actions.
- Parameters:
action_dim – Dimension of the action space.
- actions_from_params(mean_actions: Tensor, log_std: Tensor, deterministic: bool = False) Tensor [source]
Returns samples from the probability distribution given its parameters.
- Returns:
actions
- entropy() Tensor [source]
Returns Shannon’s entropy of the probability
- Returns:
the entropy, or None if no analytical form is known
- log_prob(actions: Tensor) Tensor [source]
Get the log probabilities of actions according to the distribution. Note that you must first call the
proba_distribution()
method.- Parameters:
actions –
- Returns:
- log_prob_from_params(mean_actions: Tensor, log_std: Tensor) Tuple[Tensor, Tensor] [source]
Compute the log probability of taking an action given the distribution parameters.
- Parameters:
mean_actions –
log_std –
- Returns:
- mode() Tensor [source]
Returns the most likely action (deterministic output) from the probability distribution
- Returns:
the stochastic action
- prob() Tensor [source]
Return a tensor which indicates the distribution
- Returns:
A distribution tensor
- Return type:
torch.Tensor
- proba_distribution(mean_actions: Tensor, log_std: Tensor) DiagGaussianDistribution [source]
Create the distribution given its parameters (mean, std)
- Parameters:
mean_actions –
log_std –
- Returns:
- proba_distribution_net(latent_dim: int, log_std_init: float = 0.0) Tuple[Module, Parameter] [source]
Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values)
- Parameters:
latent_dim – Dimension of the last layer of the policy (before the action layer)
log_std_init – Initial value for the log standard deviation
- Returns:
- class malib.common.distributions.Distribution[source]
Bases:
ABC
Abstract base class for distributions.
- abstract actions_from_params(*args, **kwargs) Tensor [source]
Returns samples from the probability distribution given its parameters.
- Returns:
actions
- abstract entropy() Optional[Tensor] [source]
Returns Shannon’s entropy of the probability
- Returns:
the entropy, or None if no analytical form is known
- get_actions(deterministic: bool = False) Tensor [source]
Return actions according to the probability distribution.
- Parameters:
deterministic –
- Returns:
- abstract log_prob(x: Tensor) Tensor [source]
Returns the log likelihood
- Parameters:
x – the taken action
- Returns:
The log likelihood of the distribution
- abstract log_prob_from_params(*args, **kwargs) Tuple[Tensor, Tensor] [source]
Returns samples and the associated log probabilities from the probability distribution given its parameters.
- Returns:
actions and log prob
- abstract mode() Tensor [source]
Returns the most likely action (deterministic output) from the probability distribution
- Returns:
the stochastic action
- abstract prob() Tensor [source]
Return a tensor which indicates the distribution
- Returns:
A distribution tensor
- Return type:
torch.Tensor
- abstract proba_distribution(*args, **kwargs) Distribution [source]
Set parameters of the distribution.
- Returns:
self
- class malib.common.distributions.MaskedCategorical(scores, mask=None)[source]
Bases:
object
- property entropy
- property logits
- static masked_softmax(logits, mask)[source]
This method will return valid probability distribution for the particular instance if its corresponding row in the mask matrix is not a zero vector. Otherwise, a uniform distribution will be returned. This is just a technical workaround that allows Categorical class usage. If probs doesn’t sum to one there will be an exception during sampling.
- property normalized_entropy
- property probs
- class malib.common.distributions.MultiCategoricalDistribution(action_dims: List[int])[source]
Bases:
Distribution
MultiCategorical distribution for multi discrete actions.
- Parameters:
action_dims – List of sizes of discrete action spaces
- actions_from_params(action_logits: Tensor, deterministic: bool = False) Tensor [source]
Returns samples from the probability distribution given its parameters.
- Returns:
actions
- entropy() Tensor [source]
Returns Shannon’s entropy of the probability
- Returns:
the entropy, or None if no analytical form is known
- log_prob(actions: Tensor) Tensor [source]
Returns the log likelihood
- Parameters:
x – the taken action
- Returns:
The log likelihood of the distribution
- log_prob_from_params(action_logits: Tensor) Tuple[Tensor, Tensor] [source]
Returns samples and the associated log probabilities from the probability distribution given its parameters.
- Returns:
actions and log prob
- mode() Tensor [source]
Returns the most likely action (deterministic output) from the probability distribution
- Returns:
the stochastic action
- proba_distribution(action_logits: Tensor) MultiCategoricalDistribution [source]
Set parameters of the distribution.
- Returns:
self
- proba_distribution_net(latent_dim: int) Module [source]
Create the layer that represents the distribution: it will be the logits (flattened) of the MultiCategorical distribution. You can then get probabilities using a softmax on each sub-space.
- Parameters:
latent_dim – Dimension of the last layer of the policy network (before the action layer)
- Returns:
- class malib.common.distributions.SquashedDiagGaussianDistribution(action_dim: int, epsilon: float = 1e-06)[source]
Bases:
DiagGaussianDistribution
Gaussian distribution with diagonal covariance matrix, followed by a squashing function (tanh) to ensure bounds.
- Parameters:
action_dim – Dimension of the action space.
epsilon – small value to avoid NaN due to numerical imprecision.
- entropy() Optional[Tensor] [source]
Returns Shannon’s entropy of the probability
- Returns:
the entropy, or None if no analytical form is known
- log_prob(actions: Tensor, gaussian_actions: Optional[Tensor] = None) Tensor [source]
Get the log probabilities of actions according to the distribution. Note that you must first call the
proba_distribution()
method.- Parameters:
actions –
- Returns:
- log_prob_from_params(mean_actions: Tensor, log_std: Tensor) Tuple[Tensor, Tensor] [source]
Compute the log probability of taking an action given the distribution parameters.
- Parameters:
mean_actions –
log_std –
- Returns:
- mode() Tensor [source]
Returns the most likely action (deterministic output) from the probability distribution
- Returns:
the stochastic action
- proba_distribution(mean_actions: Tensor, log_std: Tensor) SquashedDiagGaussianDistribution [source]
Create the distribution given its parameters (mean, std)
- Parameters:
mean_actions –
log_std –
- Returns:
- class malib.common.distributions.StateDependentNoiseDistribution(action_dim: int, full_std: bool = True, use_expln: bool = False, squash_output: bool = False, learn_features: bool = False, epsilon: float = 1e-06)[source]
Bases:
Distribution
Distribution class for using generalized State Dependent Exploration (gSDE). Paper: https://arxiv.org/abs/2005.05719
It is used to create the noise exploration matrix and compute the log probability of an action with that noise.
- Parameters:
action_dim – Dimension of the action space.
full_std – Whether to use (n_features x n_actions) parameters for the std instead of only (n_features,)
use_expln – Use
expln()
function instead ofexp()
to ensure a positive standard deviation (cf paper). It allows to keep variance above zero and prevent it from growing too fast. In practice,exp()
is usually enough.squash_output – Whether to squash the output using a tanh function, this ensures bounds are satisfied.
learn_features – Whether to learn features for gSDE or not. This will enable gradients to be backpropagated through the features
latent_sde
in the code.epsilon – small value to avoid NaN due to numerical imprecision.
- actions_from_params(mean_actions: Tensor, log_std: Tensor, latent_sde: Tensor, deterministic: bool = False) Tensor [source]
Returns samples from the probability distribution given its parameters.
- Returns:
actions
- entropy() Optional[Tensor] [source]
Returns Shannon’s entropy of the probability
- Returns:
the entropy, or None if no analytical form is known
- get_std(log_std: Tensor) Tensor [source]
Get the standard deviation from the learned parameter (log of it by default). This ensures that the std is positive.
- Parameters:
log_std –
- Returns:
- log_prob(actions: Tensor) Tensor [source]
Returns the log likelihood
- Parameters:
x – the taken action
- Returns:
The log likelihood of the distribution
- log_prob_from_params(mean_actions: Tensor, log_std: Tensor, latent_sde: Tensor) Tuple[Tensor, Tensor] [source]
Returns samples and the associated log probabilities from the probability distribution given its parameters.
- Returns:
actions and log prob
- mode() Tensor [source]
Returns the most likely action (deterministic output) from the probability distribution
- Returns:
the stochastic action
- proba_distribution(mean_actions: Tensor, log_std: Tensor, latent_sde: Tensor) StateDependentNoiseDistribution [source]
Create the distribution given its parameters (mean, std)
- Parameters:
mean_actions –
log_std –
latent_sde –
- Returns:
- proba_distribution_net(latent_dim: int, log_std_init: float = -2.0, latent_sde_dim: Optional[int] = None) Tuple[Module, Parameter] [source]
Create the layers and parameter that represent the distribution: one output will be the deterministic action, the other parameter will be the standard deviation of the distribution that control the weights of the noise matrix.
- Parameters:
latent_dim – Dimension of the last layer of the policy (before the action layer)
log_std_init – Initial value for the log standard deviation
latent_sde_dim – Dimension of the last layer of the features extractor for gSDE. By default, it is shared with the policy network.
- Returns:
- class malib.common.distributions.TanhBijector(epsilon: float = 1e-06)[source]
Bases:
object
Bijective transformation of a probability distribution using a squashing function (tanh) TODO: use Pyro instead (https://pyro.ai/)
- Parameters:
epsilon – small value to avoid NaN due to numerical imprecision.
- static atanh(x: Tensor) Tensor [source]
Inverse of Tanh
Taken from Pyro: https://github.com/pyro-ppl/pyro 0.5 * torch.log((1 + x ) / (1 - x))
- malib.common.distributions.kl_divergence(dist_true: Distribution, dist_pred: Distribution) Tensor [source]
Wrapper for the PyTorch implementation of the full form KL Divergence
- Parameters:
dist_true – the p distribution
dist_pred – the q distribution
- Returns:
KL(dist_true||dist_pred)
- malib.common.distributions.make_proba_distribution(action_space: Space, use_sde: bool = False, dist_kwargs: Optional[Dict[str, Any]] = None) Distribution [source]
Return an instance of Distribution for the correct type of action space.
- Parameters:
action_space (gym.spaces.Space) – The action space.
use_sde (bool, optional) – Force the use of StateDependentNoiseDistribution instead of DiagGaussianDistribution. Defaults to False.
dist_kwargs (Optional[Dict[str, Any]], optional) – Keyword arguments to pass to the probability distribution. Defaults to None.
- Raises:
NotImplementedError – Probability distribution not implemented for the specified action space.
- Returns:
The appropriate Distribution object
- Return type:
malib.common.manager module
malib.common.payoff_manager module
malib.common.strategy_spec module
- class malib.common.strategy_spec.StrategySpec(identifier: str, policy_ids: Tuple[str], meta_data: Dict[str, Any])[source]
Bases:
object
Construct a strategy spec.
- Parameters:
identifier (str) – Runtime id as identifier.
policy_ids (Tuple[PolicyID]) – A tuple of policy id, could be empty.
meta_data (Dict[str, Any]) – Meta data, for policy construction.
- gen_policy(device=None) Policy [source]
Generate a policy instance with the given meta data.
- Returns:
A policy instance.
- Return type:
- get_meta_data() Dict[str, Any] [source]
Return meta data. Keys in meta-data contains
policy_cls: policy class type
kwargs: a dict of parameters for policy construction
experiment_tag: a string for experiment identification
optim_config: optional, a dict for optimizer construction
- Returns:
A dict of meta data.
- Return type:
Dict[str, Any]
- property num_policy: int
- register_policy_id(policy_id: str)[source]
Register new policy id, and preset prob as 0.
- Parameters:
policy_id (PolicyID) – Policy id to register.