malib.models.torch package

Submodules

malib.models.torch.continuous module

class malib.models.torch.continuous.Actor(preprocess_net: Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), max_action: float = 1.0, device: Union[str, int, device] = 'cpu', preprocess_net_output_dim: Optional[int] = None)[source]

Bases: Module

Simple actor network. Will create an actor operated in continuous action space with structure of preprocess_net —> action_shape. :param preprocess_net: a self-defined preprocess_net which output a

flattened hidden state.

Parameters:
  • action_shape – a sequence of int for the shape of action.

  • hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

  • max_action (float) – the scale for the final action logits. Default to 1.

  • preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to build_the_network. .. seealso:

Please refer to :class:`~tianshou.utils.net.common.Net` as an instance
of how preprocess_net is suggested to be defined.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs: Union[ndarray, Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) Tuple[Tensor, Any][source]

Mapping: obs -> logits -> action.

training: bool
class malib.models.torch.continuous.ActorProb(preprocess_net: Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), max_action: float = 1.0, device: Union[str, int, device] = 'cpu', unbounded: bool = False, conditioned_sigma: bool = False, preprocess_net_output_dim: Optional[int] = None)[source]

Bases: Module

Simple actor network (output with a Gauss distribution). :param preprocess_net: a self-defined preprocess_net which output a

flattened hidden state.

Parameters:
  • action_shape – a sequence of int for the shape of action.

  • hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

  • max_action (float) – the scale for the final action logits. Default to 1.

  • unbounded (bool) – whether to apply tanh activation on final logits. Default to False.

  • conditioned_sigma (bool) – True when sigma is calculated from the input, False when sigma is an independent parameter. Default to False.

  • preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to build_the_network. .. seealso:

Please refer to :class:`~tianshou.utils.net.common.Net` as an instance
of how preprocess_net is suggested to be defined.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs: Union[ndarray, Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) Tuple[Tuple[Tensor, Tensor], Any][source]

Mapping: obs -> logits -> (mu, sigma).

training: bool
class malib.models.torch.continuous.Critic(preprocess_net: Module, hidden_sizes: Sequence[int] = (), device: Union[str, int, device] = 'cpu', preprocess_net_output_dim: Optional[int] = None)[source]

Bases: Module

Simple critic network. Will create an actor operated in continuous action space with structure of preprocess_net —> 1(q value). :param preprocess_net: a self-defined preprocess_net which output a

flattened hidden state.

Parameters:
  • hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

  • preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to build_the_network. .. seealso:

Please refer to :class:`~tianshou.utils.net.common.Net` as an instance
of how preprocess_net is suggested to be defined.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs: Union[ndarray, Tensor], act: Optional[Union[ndarray, Tensor]] = None, info: Dict[str, Any] = {}) Tensor[source]

Mapping: (s, a) -> logits -> Q(s, a).

training: bool
class malib.models.torch.continuous.Perturbation(preprocess_net: Module, max_action: float, device: Union[str, int, device] = 'cpu', phi: float = 0.05)[source]

Bases: Module

Implementation of perturbation network in BCQ algorithm. Given a state and action, it can generate perturbed action. :param torch.nn.Module preprocess_net: a self-defined preprocess_net which output a

flattened hidden state.

Parameters:
  • max_action (float) – the maximum value of each dimension of action.

  • device (Union[str, int, torch.device]) – which device to create this model on. Default to cpu.

  • phi (float) – max perturbation parameter for BCQ. Default to 0.05.

For advanced usage (how to customize the network), please refer to build_the_network. .. seealso:

You can refer to `examples/offline/offline_bcq.py` to see how to use it.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(state: Tensor, action: Tensor) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class malib.models.torch.continuous.RecurrentActorProb(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int], hidden_layer_size: int = 128, max_action: float = 1.0, device: Union[str, int, device] = 'cpu', unbounded: bool = False, conditioned_sigma: bool = False)[source]

Bases: Module

Recurrent version of ActorProb. For advanced usage (how to customize the network), please refer to build_the_network.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs: Union[ndarray, Tensor], state: Optional[Dict[str, Tensor]] = None, info: Dict[str, Any] = {}) Tuple[Tuple[Tensor, Tensor], Dict[str, Tensor]][source]

Almost the same as Recurrent.

training: bool
class malib.models.torch.continuous.RecurrentCritic(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int] = [0], device: Union[str, int, device] = 'cpu', hidden_layer_size: int = 128)[source]

Bases: Module

Recurrent version of Critic. For advanced usage (how to customize the network), please refer to build_the_network.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs: Union[ndarray, Tensor], act: Optional[Union[ndarray, Tensor]] = None, info: Dict[str, Any] = {}) Tensor[source]

Almost the same as Recurrent.

training: bool
class malib.models.torch.continuous.VAE(encoder: Module, decoder: Module, hidden_dim: int, latent_dim: int, max_action: float, device: Union[str, device] = 'cpu')[source]

Bases: Module

Implementation of VAE. It models the distribution of action. Given a state, it can generate actions similar to those in batch. It is used in BCQ algorithm. :param torch.nn.Module encoder: the encoder in VAE. Its input_dim must be

state_dim + action_dim, and output_dim must be hidden_dim.

Parameters:
  • decoder (torch.nn.Module) – the decoder in VAE. Its input_dim must be state_dim + latent_dim, and output_dim must be action_dim.

  • hidden_dim (int) – the size of the last linear-layer in encoder.

  • latent_dim (int) – the size of latent layer.

  • max_action (float) – the maximum value of each dimension of action.

  • device (Union[str, torch.device]) – which device to create this model on. Default to “cpu”.

For advanced usage (how to customize the network), please refer to build_the_network. .. seealso:

You can refer to `examples/offline/offline_bcq.py` to see how to use it.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

decode(state: Tensor, latent_z: Optional[Tensor] = None) Tensor[source]
forward(state: Tensor, action: Tensor) Tuple[Tensor, Tensor, Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

malib.models.torch.discrete module

class malib.models.torch.discrete.Actor(preprocess_net: Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), softmax_output: bool = True, preprocess_net_output_dim: Optional[int] = None, device: Union[str, int, device] = 'cpu')[source]

Bases: Module

Simple actor network. Will create an actor operated in discrete action space with structure of preprocess_net —> action_shape. :param preprocess_net: a self-defined preprocess_net which output a

flattened hidden state.

Parameters:
  • action_shape – a sequence of int for the shape of action.

  • hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

  • softmax_output (bool) – whether to apply a softmax layer over the last layer’s output.

  • preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to build_the_network. .. seealso:

Please refer to :class:`~tianshou.utils.net.common.Net` as an instance
of how preprocess_net is suggested to be defined.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs: Union[ndarray, Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) Tuple[Tensor, Any][source]

Mapping: s -> Q(s, *).

training: bool
class malib.models.torch.discrete.CosineEmbeddingNetwork(num_cosines: int, embedding_dim: int)[source]

Bases: Module

Cosine embedding network for IQN. Convert a scalar in [0, 1] to a list of n-dim vectors. :param num_cosines: the number of cosines used for the embedding. :param embedding_dim: the dimension of the embedding/output. .. note:

From https://github.com/ku2482/fqf-iqn-qrdqn.pytorch/blob/master
/fqf_iqn_qrdqn/network.py .

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(taus: Tensor) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class malib.models.torch.discrete.Critic(preprocess_net: Module, hidden_sizes: Sequence[int] = (), last_size: int = 1, preprocess_net_output_dim: Optional[int] = None, device: Union[str, int, device] = 'cpu')[source]

Bases: Module

Simple critic network. Will create an actor operated in discrete action space with structure of preprocess_net —> 1(q value). :param preprocess_net: a self-defined preprocess_net which output a

flattened hidden state.

Parameters:
  • hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

  • last_size (int) – the output dimension of Critic network. Default to 1.

  • preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to build_the_network. .. seealso:

Please refer to :class:`~tianshou.utils.net.common.Net` as an instance
of how preprocess_net is suggested to be defined.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs: Union[ndarray, Tensor], **kwargs: Any) Tensor[source]

Mapping: s -> V(s).

training: bool
class malib.models.torch.discrete.FractionProposalNetwork(num_fractions: int, embedding_dim: int)[source]

Bases: Module

Fraction proposal network for FQF. :param num_fractions: the number of factions to propose. :param embedding_dim: the dimension of the embedding/input. .. note:

Adapted from https://github.com/ku2482/fqf-iqn-qrdqn.pytorch/blob/master
/fqf_iqn_qrdqn/network.py .

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs_embeddings: Tensor) Tuple[Tensor, Tensor, Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class malib.models.torch.discrete.ImplicitQuantileNetwork(preprocess_net: Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), num_cosines: int = 64, preprocess_net_output_dim: Optional[int] = None, device: Union[str, int, device] = 'cpu')[source]

Bases: Critic

Implicit Quantile Network. :param preprocess_net: a self-defined preprocess_net which output a

flattened hidden state.

Parameters:
  • action_dim (int) – the dimension of action space.

  • hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

  • num_cosines (int) – the number of cosines to use for cosine embedding. Default to 64.

  • preprocess_net_output_dim (int) – the output dimension of preprocess_net.

Note

Although this class inherits Critic, it is actually a quantile Q-Network with output shape (batch_size, action_dim, sample_size). The second item of the first return value is tau vector.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs: Union[ndarray, Tensor], sample_size: int, **kwargs: Any) Tuple[Any, Tensor][source]

Mapping: s -> Q(s, *).

training: bool
class malib.models.torch.discrete.IntrinsicCuriosityModule(feature_net: Module, feature_dim: int, action_dim: int, hidden_sizes: Sequence[int] = (), device: Union[str, device] = 'cpu')[source]

Bases: Module

Implementation of Intrinsic Curiosity Module. arXiv:1705.05363. :param torch.nn.Module feature_net: a self-defined feature_net which output a

flattened hidden state.

Parameters:
  • feature_dim (int) – input dimension of the feature net.

  • action_dim (int) – dimension of the action space.

  • hidden_sizes – hidden layer sizes for forward and inverse models.

  • device – device for the module.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(s1: Union[ndarray, Tensor], act: Union[ndarray, Tensor], s2: Union[ndarray, Tensor], **kwargs: Any) Tuple[Tensor, Tensor][source]

Mapping: s1, act, s2 -> mse_loss, act_hat.

training: bool
class malib.models.torch.discrete.NoisyLinear(in_features: int, out_features: int, noisy_std: float = 0.5)[source]

Bases: Module

Implementation of Noisy Networks. arXiv:1706.10295. :param int in_features: the number of input features. :param int out_features: the number of output features. :param float noisy_std: initial standard deviation of noisy linear layers. .. note:

Adapted from https://github.com/ku2482/fqf-iqn-qrdqn.pytorch/blob/master
/fqf_iqn_qrdqn/network.py .

Initializes internal Module state, shared by both nn.Module and ScriptModule.

f(x: Tensor) Tensor[source]
forward(x: Tensor) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset() None[source]
sample() None[source]
training: bool
malib.models.torch.discrete.sample_noise(model: Module) bool[source]

Sample the random noises of NoisyLinear modules in the model. :param model: a PyTorch module which may have NoisyLinear submodules. :returns: True if model has at least one NoisyLinear submodule;

otherwise, False.

malib.models.torch.net module

class malib.models.torch.net.ActorCritic(actor: Module, critic: Module)[source]

Bases: Module

An actor-critic network for parsing parameters. Using actor_critic.parameters() instead of set.union or list+list to avoid issue #449. :param nn.Module actor: the actor network. :param nn.Module critic: the critic network.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

training: bool
class malib.models.torch.net.DataParallelNet(net: Module)[source]

Bases: Module

DataParallel wrapper for training agent with multi-GPU. This class does only the conversion of input data type, from numpy array to torch’s Tensor. If the input is a nested dictionary, the user should create a similar class to do the same thing. :param nn.Module net: the network to be distributed in different GPUs.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs: Union[ndarray, Tensor], *args: Any, **kwargs: Any) Tuple[Any, Any][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class malib.models.torch.net.MLP(input_dim: int, output_dim: int = 0, hidden_sizes: ~typing.Sequence[int] = (), norm_layer: ~typing.Optional[~typing.Union[~typing.Type[~torch.nn.modules.module.Module], ~typing.Sequence[~typing.Type[~torch.nn.modules.module.Module]]]] = None, activation: ~typing.Optional[~typing.Union[~typing.Type[~torch.nn.modules.module.Module], ~typing.Sequence[~typing.Type[~torch.nn.modules.module.Module]]]] = <class 'torch.nn.modules.activation.ReLU'>, device: ~typing.Optional[~typing.Union[str, int, ~torch.device]] = None, linear_layer: ~typing.Type[~torch.nn.modules.linear.Linear] = <class 'torch.nn.modules.linear.Linear'>)[source]

Bases: Module

Create a MLP.

Parameters:
  • input_dim (int) – dimension of the input vector.

  • output_dim (int, optional) – dimension of the output vector. If set to 0, there is no final linear layer. Defaults to 0.

  • hidden_sizes (Sequence[int], optional) – shape of MLP passed in as a list, not including input_dim and output_dim. Defaults to ().

  • norm_layer (Optional[Union[ModuleType, Sequence[ModuleType]]], optional) – use which normalization before activation, e.g., nn.LayerNorm and nn.BatchNorm1d. Default to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization. Defaults to None.

  • activation (Optional[Union[ModuleType, Sequence[ModuleType]]], optional) – which activation to use after each layer, can be both the same activation for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Defaults to nn.ReLU.

  • device (Optional[Union[str, int, torch.device]], optional) – which device to create this model on. Defaults to None.

  • linear_layer (Type[nn.Linear], optional) – use this module as linear layer. Defaults to nn.Linear.

forward(obs: Union[ndarray, Tensor]) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class malib.models.torch.net.Net(state_shape: ~typing.Union[int, ~typing.Sequence[int]], action_shape: ~typing.Union[int, ~typing.Sequence[int]] = 0, hidden_sizes: ~typing.Sequence[int] = (), norm_layer: ~typing.Optional[~typing.Type[~torch.nn.modules.module.Module]] = None, activation: ~typing.Optional[~typing.Type[~torch.nn.modules.module.Module]] = <class 'torch.nn.modules.activation.ReLU'>, device: ~typing.Union[str, int, ~torch.device] = 'cpu', softmax: bool = False, concat: bool = False, num_atoms: int = 1, dueling_param: ~typing.Optional[~typing.Tuple[~typing.Dict[str, ~typing.Any], ~typing.Dict[str, ~typing.Any]]] = None)[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs: Union[ndarray, Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) Tuple[Tensor, Any][source]

Mapping: obs -> flatten (inside MLP)-> logits.

training: bool
class malib.models.torch.net.Recurrent(layer_num: int, state_shape: Union[int, Sequence[int]], action_shape: Union[int, Sequence[int]], device: Union[str, int, device] = 'cpu', hidden_layer_size: int = 128)[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs: Union[ndarray, Tensor], state: Optional[Dict[str, Tensor]] = None, info: Dict[str, Any] = {}) Tuple[Tensor, Dict[str, Tensor]][source]

Mapping: obs -> flatten -> logits. In the evaluation mode, obs should be with shape [bsz, dim]; in the training mode, obs should be with shape [bsz, len, dim]. See the code and comment for more detail.

training: bool
malib.models.torch.net.make_net(observation_space: Space, action_space: Space, device: Type[device], net_type: Optional[str] = None, **kwargs) Module[source]

Create a network instance with specific network configuration.

Parameters:
  • observation_space (gym.Space) – The observation space used to determine which network type will be used, if net_type is not be specified

  • action_space (gym.Space) – The action space will be used to determine the network output dim, if output_dim or action_shape is not given in kwargs

  • device (Device) – Indicates device allocated.

  • net_type (str, optional) – Indicates the network type, could be one from {mlp, net, rnn, actor_critic, data_parallel}

Raises:

ValueError – Unexpected network type.

Returns:

A network instance.

Return type:

nn.Module

malib.models.torch.net.miniblock(input_size: int, output_size: int = 0, norm_layer: ~typing.Optional[~typing.Type[~torch.nn.modules.module.Module]] = None, activation: ~typing.Optional[~typing.Type[~torch.nn.modules.module.Module]] = None, linear_layer: ~typing.Type[~torch.nn.modules.linear.Linear] = <class 'torch.nn.modules.linear.Linear'>) List[Module][source]

Construct a miniblock with given input/output-size, norm layer and activation.

Parameters:
  • input_size (int) – The input size.

  • output_size (int, optional) – The output size. Defaults to 0.

  • norm_layer (Optional[ModuleType], optional) – A nn.Module as normal layer. Defaults to None.

  • activation (Optional[ModuleType], optional) – A nn.Module as active layer. Defaults to None.

  • linear_layer (Type[nn.Linear], optional) – A nn.Module as linear layer. Defaults to nn.Linear.

Returns:

A list of layers.

Return type:

List[nn.Module]