malib.rollout.envs package
Subpackages
- malib.rollout.envs.gr_football package
- malib.rollout.envs.gym package
- malib.rollout.envs.mdp package
- malib.rollout.envs.open_spiel package
- malib.rollout.envs.pettingzoo package
- malib.rollout.envs.sc2 package
Submodules
malib.rollout.envs.env module
- class malib.rollout.envs.env.Environment(**configs)[source]
Bases:
object- static action_adapter(policy_outputs: Dict[str, Dict[str, Any]], **kwargs)[source]
Convert policy action to environment actions. Default by policy action
- property action_spaces: Dict[str, Space]
A dict of agent action spaces
- property observation_spaces: Dict[str, Space]
A dict of agent observation spaces
- property possible_agents: List[str]
Return a list of environment agent ids
- record_episode_info_step(state: Any, observations: Dict[str, Any], rewards: Dict[str, Any], dones: Dict[str, bool], infos: Any)[source]
Analyze timestep and record it as episode information.
- Parameters:
state (Any) – Environment state.
observations (Dict[AgentID, Any]) – A dict of agent observations
rewards (Dict[AgentID, Any]) – A dict of agent rewards.
dones (Dict[AgentID, bool]) – A dict of done signals.
infos (Any) – Information.
- reset(max_step: Optional[int] = None) Union[None, Sequence[Dict[str, Any]]][source]
Reset environment and the episode info handler here.
- step(actions: Dict[str, Any]) Tuple[Dict[str, Any], Dict[str, Any], Dict[str, float], Dict[str, bool], Any][source]
Return a 5-tuple as (state, observation, reward, done, info). Each item is a dict maps from agent id to entity.
Note
If state return of this environment is not activated, the return state would be None.
- Parameters:
actions (Dict[AgentID, Any]) – A dict of agent actions.
- Returns:
A tuple follows the order as (state, observation, reward, done, info).
- Return type:
Tuple[ Dict[AgentID, Any], Dict[AgentID, Any], Dict[AgentID, float], Dict[AgentID, bool], Any]
- time_step(actions: Dict[str, Any]) Tuple[Dict[str, Any], Dict[str, float], Dict[str, bool], Dict[str, Any]][source]
Environment stepping logic.
- Parameters:
actions (Dict[AgentID, Any]) – Agent action dict.
- Raises:
NotImplementedError – Not implmeneted error
- Returns:
A 4-tuples, listed as (observations, rewards, dones, infos)
- Return type:
Tuple[Dict[AgentID, Any], Dict[AgentID, float], Dict[AgentID, bool], Dict[AgentID, Any]]
- class malib.rollout.envs.env.GroupWrapper(env: Environment, aid_to_gid: Dict[str, str], agent_groups: Dict[str, List[str]])[source]
Bases:
WrapperConstruct a wrapper for a given enviornment instance.
- Parameters:
env (Environment) – Environment instance.
- property action_spaces: Dict[str, Space]
A dict of agent action spaces
- property agent_groups: Dict[str, List[str]]
- agent_to_group(agent_id: str) str[source]
Mapping agent id to groupd id.
- Parameters:
agent_id (AgentID) – Agent id.
- Returns:
Group id.
- Return type:
str
- build_state_from_observation(agent_observation: Dict[str, Any]) Dict[str, ndarray][source]
Build state from raw observation.
- Parameters:
agent_observation (Dict[AgentID, Any]) – A dict of agent observation.
- Raises:
NotImplementedError – Not implemented error
- Returns:
A dict of states.
- Return type:
Dict[str, np.ndarray]
- property observation_spaces: Dict[str, Space]
A dict of agent observation spaces
- property possible_agents: List[str]
Return a list of environment agent ids
- record_episode_info_step(observations, rewards, dones, infos)[source]
Analyze timestep and record it as episode information.
- Parameters:
state (Any) – Environment state.
observations (Dict[AgentID, Any]) – A dict of agent observations
rewards (Dict[AgentID, Any]) – A dict of agent rewards.
dones (Dict[AgentID, bool]) – A dict of done signals.
infos (Any) – Information.
- reset(max_step: Optional[int] = None) Union[None, Dict[str, Dict[str, Any]]][source]
Reset environment and the episode info handler here.
- property state_spaces: Dict[str, Space]
Return a dict of group state spaces.
Note
Users must implement the method build_state_space.
- Returns:
A dict of state spaces.
- Return type:
Dict[str, gym.Space]
- time_step(actions: Dict[str, Any])[source]
Environment stepping logic.
- Parameters:
actions (Dict[AgentID, Any]) – Agent action dict.
- Raises:
NotImplementedError – Not implmeneted error
- Returns:
A 4-tuples, listed as (observations, rewards, dones, infos)
- Return type:
Tuple[Dict[AgentID, Any], Dict[AgentID, float], Dict[AgentID, bool], Dict[AgentID, Any]]
- class malib.rollout.envs.env.Wrapper(env: Environment)[source]
Bases:
EnvironmentWraps the environment to allow a modular transformation
Construct a wrapper for a given enviornment instance.
- Parameters:
env (Environment) – Environment instance.
- property action_spaces: Dict[str, Space]
A dict of agent action spaces
- property observation_spaces: Dict[str, Space]
A dict of agent observation spaces
- property possible_agents: List[str]
Return a list of environment agent ids
- reset(max_step: Optional[int] = None) Union[None, Tuple[Dict[str, Any]]][source]
Reset environment and the episode info handler here.
- step(actions: Dict[str, Any]) Tuple[Dict[str, Any], Dict[str, Any], Dict[str, float], Dict[str, bool], Any][source]
Return a 5-tuple as (state, observation, reward, done, info). Each item is a dict maps from agent id to entity.
Note
If state return of this environment is not activated, the return state would be None.
- Parameters:
actions (Dict[AgentID, Any]) – A dict of agent actions.
- Returns:
A tuple follows the order as (state, observation, reward, done, info).
- Return type:
Tuple[ Dict[AgentID, Any], Dict[AgentID, Any], Dict[AgentID, float], Dict[AgentID, bool], Any]