Learning to Play Against Unknown Opponents

📅 2024-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the problem of designing optimal strategies for a learning agent engaged in repeated asymmetric-information games against a sophisticated opponent with an unknown payoff function: the agent knows only its own payoff function and a prior distribution over the opponent’s payoffs, aiming to maximize long-run average utility. Methodologically, it introduces a novel geometric “menu”-structured optimization framework for learning algorithm design, integrating distributionally robust optimization, no-regret learning theory, and polynomial-time approximation algorithms. Key contributions include: (1) the first polynomial-time algorithm that is both ε-optimal and satisfies no-regret constraints; (2) ε-optimality achieved without assuming knowledge of the opponent’s exact type distribution—only requiring the game size or the opponent’s support size to be constant; and (3) asymptotically optimal utility under no-regret constraints, with polynomial time complexity and no additional assumptions on opponent behavior or payoff structure.

Technology Category

Application Category

📝 Abstract
We consider the problem of a learning agent who has to repeatedly play a general sum game against a strategic opponent who acts to maximize their own payoff by optimally responding against the learner's algorithm. The learning agent knows their own payoff function, but is uncertain about the payoff of their opponent (knowing only that it is drawn from some distribution $mathcal{D}$). What learning algorithm should the agent run in order to maximize their own total utility? We demonstrate how to construct an $varepsilon$-optimal learning algorithm (obtaining average utility within $varepsilon$ of the optimal utility) for this problem in time polynomial in the size of the input and $1/varepsilon$ when either the size of the game or the support of $mathcal{D}$ is constant. When the learning algorithm is further constrained to be a no-regret algorithm, we demonstrate how to efficiently construct an optimal learning algorithm (asymptotically achieving the optimal utility) in polynomial time, independent of any other assumptions. Both results make use of recently developed machinery that converts the analysis of learning algorithms to the study of the class of corresponding geometric objects known as menus.
Problem

Research questions and friction points this paper is trying to address.

Learning Algorithm
Asymmetric Information Games
Optimal Strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal Strategy Learning
Adversarial Repeated Games
Regret Minimization Algorithm
🔎 Similar Papers
No similar papers found.