Empirical Game-Theoretic Methods to Minimize Regret Against Specific Opponents
Published in Proceedings of SPIE Defense + Commercial Sensing Symposium, 2021
In many real-world multi-domain applications, if there is an opportunity to sense the opponent’s strategy from previous rounds, an agent can exploit its opponent in payoffs by playing a specific Best Response (BR) strategy. The optimal way to exploit the opponents in such contests is to learn specific BR strategies for each candidate’s strategies. However, in a large strategy space, learning BR policies for all the strategies is infeasible. We need to find a reasonable point to learn some BR strategies to exploit the opponent strategies, which motivates us to propose a novel method for finding better agent strategies while facing a specific set of opponents in repeated games. Our method is called Clustered Double Oracle Empirical Game-Theoretic Analysis (CDO-EGTA), which builds upon the classic Double Oracle (DO) framework based on Deep Q-Network (DQN) and clustering methods. Empirical results show that our method outperforms the current state-of-the-art methods in terms of regret.
Recommended citation: M. Porag, J. Perez, C. Kiekintveld, T. Son, W. Yeoh, E. Pontelli, "Empirical Game-Theoretic Methods to Minimize Regret Against Specific Opponents". Proceedings of SPIE Defense + Commercial Sensing Symposium, 2021.
Download Paper