malib.rollout.envs package

Subpackages

Submodules

malib.rollout.envs.env module

class malib.rollout.envs.env.Environment(**configs)[source]

Bases: object

static action_adapter(policy_outputs: Dict[str, Dict[str, Any]], **kwargs)[source]

Convert policy action to environment actions. Default by policy action

property action_spaces: Dict[str, Space]

A dict of agent action spaces

close()[source]
collect_info() Dict[str, Any][source]
env_done_check(agent_dones: Dict[str, bool]) bool[source]
property observation_spaces: Dict[str, Space]

A dict of agent observation spaces

property possible_agents: List[str]

Return a list of environment agent ids

record_episode_info_step(state: Any, observations: Dict[str, Any], rewards: Dict[str, Any], dones: Dict[str, bool], infos: Any)[source]

Analyze timestep and record it as episode information.

Parameters:
  • state (Any) – Environment state.

  • observations (Dict[AgentID, Any]) – A dict of agent observations

  • rewards (Dict[AgentID, Any]) – A dict of agent rewards.

  • dones (Dict[AgentID, bool]) – A dict of done signals.

  • infos (Any) – Information.

render(*args, **kwargs)[source]
reset(max_step: Optional[int] = None) Union[None, Sequence[Dict[str, Any]]][source]

Reset environment and the episode info handler here.

seed(seed: Optional[int] = None)[source]
step(actions: Dict[str, Any]) Tuple[Dict[str, Any], Dict[str, Any], Dict[str, float], Dict[str, bool], Any][source]

Return a 5-tuple as (state, observation, reward, done, info). Each item is a dict maps from agent id to entity.

Note

If state return of this environment is not activated, the return state would be None.

Parameters:

actions (Dict[AgentID, Any]) – A dict of agent actions.

Returns:

A tuple follows the order as (state, observation, reward, done, info).

Return type:

Tuple[ Dict[AgentID, Any], Dict[AgentID, Any], Dict[AgentID, float], Dict[AgentID, bool], Any]

time_step(actions: Dict[str, Any]) Tuple[Dict[str, Any], Dict[str, float], Dict[str, bool], Dict[str, Any]][source]

Environment stepping logic.

Parameters:

actions (Dict[AgentID, Any]) – Agent action dict.

Raises:

NotImplementedError – Not implmeneted error

Returns:

A 4-tuples, listed as (observations, rewards, dones, infos)

Return type:

Tuple[Dict[AgentID, Any], Dict[AgentID, float], Dict[AgentID, bool], Dict[AgentID, Any]]

class malib.rollout.envs.env.GroupWrapper(env: Environment, aid_to_gid: Dict[str, str], agent_groups: Dict[str, List[str]])[source]

Bases: Wrapper

Construct a wrapper for a given enviornment instance.

Parameters:

env (Environment) – Environment instance.

action_mask_extract(raw_observations: Dict[str, Any])[source]
property action_spaces: Dict[str, Space]

A dict of agent action spaces

property agent_groups: Dict[str, List[str]]
agent_to_group(agent_id: str) str[source]

Mapping agent id to groupd id.

Parameters:

agent_id (AgentID) – Agent id.

Returns:

Group id.

Return type:

str

build_state_from_observation(agent_observation: Dict[str, Any]) Dict[str, ndarray][source]

Build state from raw observation.

Parameters:

agent_observation (Dict[AgentID, Any]) – A dict of agent observation.

Raises:

NotImplementedError – Not implemented error

Returns:

A dict of states.

Return type:

Dict[str, np.ndarray]

build_state_spaces() Dict[str, Space][source]

Call self.group_to_agents to build state space here

env_done_check(agent_dones: Dict[str, bool]) bool[source]
property observation_spaces: Dict[str, Space]

A dict of agent observation spaces

property possible_agents: List[str]

Return a list of environment agent ids

record_episode_info_step(observations, rewards, dones, infos)[source]

Analyze timestep and record it as episode information.

Parameters:
  • state (Any) – Environment state.

  • observations (Dict[AgentID, Any]) – A dict of agent observations

  • rewards (Dict[AgentID, Any]) – A dict of agent rewards.

  • dones (Dict[AgentID, bool]) – A dict of done signals.

  • infos (Any) – Information.

reset(max_step: Optional[int] = None) Union[None, Dict[str, Dict[str, Any]]][source]

Reset environment and the episode info handler here.

property state_spaces: Dict[str, Space]

Return a dict of group state spaces.

Note

Users must implement the method build_state_space.

Returns:

A dict of state spaces.

Return type:

Dict[str, gym.Space]

time_step(actions: Dict[str, Any])[source]

Environment stepping logic.

Parameters:

actions (Dict[AgentID, Any]) – Agent action dict.

Raises:

NotImplementedError – Not implmeneted error

Returns:

A 4-tuples, listed as (observations, rewards, dones, infos)

Return type:

Tuple[Dict[AgentID, Any], Dict[AgentID, float], Dict[AgentID, bool], Dict[AgentID, Any]]

class malib.rollout.envs.env.Wrapper(env: Environment)[source]

Bases: Environment

Wraps the environment to allow a modular transformation

Construct a wrapper for a given enviornment instance.

Parameters:

env (Environment) – Environment instance.

property action_spaces: Dict[str, Space]

A dict of agent action spaces

close()[source]
collect_info() Dict[str, Any][source]
property observation_spaces: Dict[str, Space]

A dict of agent observation spaces

property possible_agents: List[str]

Return a list of environment agent ids

render(*args, **kwargs)[source]
reset(max_step: Optional[int] = None) Union[None, Tuple[Dict[str, Any]]][source]

Reset environment and the episode info handler here.

seed(seed: Optional[int] = None)[source]
step(actions: Dict[str, Any]) Tuple[Dict[str, Any], Dict[str, Any], Dict[str, float], Dict[str, bool], Any][source]

Return a 5-tuple as (state, observation, reward, done, info). Each item is a dict maps from agent id to entity.

Note

If state return of this environment is not activated, the return state would be None.

Parameters:

actions (Dict[AgentID, Any]) – A dict of agent actions.

Returns:

A tuple follows the order as (state, observation, reward, done, info).

Return type:

Tuple[ Dict[AgentID, Any], Dict[AgentID, Any], Dict[AgentID, float], Dict[AgentID, bool], Any]

malib.rollout.envs.vector_env module