malib.evaluator.psro module

Implementation of global evaluator for Policy Space Response Oracle (PSRO) algorithms. This evaluator will evaluate the exploitablility between weighted payoff and an oracle payoff.

class malib.evaluator.psro.PSROEvaluator(**config)[source]

Bases: malib.evaluator.base_evaluator.BaseEvaluator

Evaluator for Policy Space Response Oracle algorithms

Create a PSRO evaluator instance.

Parameters: config (Dict[str,Any]) – A dictionary of stopping metrics.

class StopMetrics[source]

Bases: object

Supported stopping metrics

MAX_ITERATION = 'max_iteration': Max iteration

NASH_COV = 'nash cov'

PAYOFF_DIFF_THRESHOLD = 'payoff_diff_threshold': Threshold of difference between the estimated payoff of best response and NE’s

evaluate(content: Union[malib.utils.typing.RolloutFeedback, malib.utils.typing.TrainingFeedback], weighted_payoffs=None, oracle_payoffs=None, trainable_mapping=None)[source]: Evaluate global convergence by comparing the margin between Nash and best response. Or, an estimation of exploitability