malib.evaluator.psro module
Implementation of global evaluator for Policy Space Response Oracle (PSRO) algorithms. This evaluator will evaluate the exploitablility between weighted payoff and an oracle payoff.
- class malib.evaluator.psro.PSROEvaluator(**config)[source]
Bases:
malib.evaluator.base_evaluator.BaseEvaluatorEvaluator for Policy Space Response Oracle algorithms
Create a PSRO evaluator instance.
- Parameters
config (Dict[str,Any]) – A dictionary of stopping metrics.
- class StopMetrics[source]
Bases:
objectSupported stopping metrics
- MAX_ITERATION = 'max_iteration'
Max iteration
- NASH_COV = 'nash cov'
- PAYOFF_DIFF_THRESHOLD = 'payoff_diff_threshold'
Threshold of difference between the estimated payoff of best response and NE’s
- evaluate(content: Union[malib.utils.typing.RolloutFeedback, malib.utils.typing.TrainingFeedback], weighted_payoffs=None, oracle_payoffs=None, trainable_mapping=None)[source]
Evaluate global convergence by comparing the margin between Nash and best response. Or, an estimation of exploitability