The impact of learning representation on agents’ behaviors in noncooperative games

  • Stefan Warren JUANG

Student thesis: Master's thesis

Abstract

This thesis delves into the learning representations of Nash equilibria (NE) in noncooperative games, where players independently optimize their individual preferences and potentials. We address two challenges: (1) reducing the computation of self-play algorithms, such as PSRO, which prevent cycles of strategy interactions and (2) understanding how non-cooperative games affect the behavior diversity of a population of agents. For the first challenge, we establish the theoretical equivalence between cyclical strategies and support strategies of a mixed-strategy NE. Leveraging this insight, we design a directed graph representation that enhances learning efficiency by six times compared to the state-of-the-art algorithm, Simplex-NeuPL. For the second challenge, we examine the phenomenon of Skill Transfer in population learning, where agents’ behaviors in noncooperative games converge to a set of general and transferrable behaviors under a single conditional neural net. We derive the Policy Gradient Integration and demonstrate that Skill Transfer results from the learning representation of Interaction Information maximization (IIM) among agents’ actions. Although IIM captures the generality of competitive behaviors to accelerate population learning, it may not fully reflect the individual preferences and diverse potentials of agents. To address this, we propose Joint mutual Entropy Minimization (JEM) to train a population of Generalists into Specialists. Our experiments show that our approach outperforms existing methods with a 15% gain in behavior diversity and a 22% increase in overall population’s performance. This thesis underscores the importance of understanding learning representation in noncooperative games.

Date of Award2023
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology
SupervisorNevin Lianwen ZHANG (Supervisor)

Cite this

'