Efficient Last-iterate Convergence Algorithms in Solving Games

πŸ“… 2023-08-22
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 3
✨ Influential: 0
πŸ“„ PDF

career value

200K/year
πŸ€– AI Summary
This work addresses the lack of last-iterate convergence guarantees for Counterfactual Regret Minimization (CFR)-type algorithms in extensive-form games (EFGs). We propose a novel Reward Transformation (RT) framework that reformulates Nash equilibrium computation as a sequence of strongly convex-concave optimization subproblems. For the first time, we establish the theoretical foundation of RT in discrete time. Building upon this, we design RTRM+ and RTCFR+β€”the first CFR variants achieving strict last-iterate convergence in discrete time without requiring uniqueness of the Nash equilibrium or regularization-based perturbations. Experiments demonstrate that our methods significantly outperform OGDA, OMWU, and CFR+ on both normal-form games (NFGs) and EFGs, exhibiting faster convergence, enhanced stability, and parameter independence.
πŸ“ Abstract
No-regret algorithms are popular for learning Nash equilibrium (NE) in two-player zero-sum normal-form games (NFGs) and extensive-form games (EFGs). Many recent works consider the last-iterate convergence no-regret algorithms. Among them, the two most famous algorithms are Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weight Update (OMWU). However, OGDA has high per-iteration complexity. OMWU exhibits a lower per-iteration complexity but poorer empirical performance, and its convergence holds only when NE is unique. Recent works propose a Reward Transformation (RT) framework for MWU, which removes the uniqueness condition and achieves competitive performance with OMWU. Unfortunately, RT-based algorithms perform worse than OGDA under the same number of iterations, and their convergence guarantee is based on the continuous-time feedback assumption, which does not hold in most scenarios. To address these issues, we provide a closer analysis of the RT framework, which holds for both continuous and discrete-time feedback. We demonstrate that the essence of the RT framework is to transform the problem of learning NE in the original game into a series of strongly convex-concave optimization problems (SCCPs). We show that the bottleneck of RT-based algorithms is the speed of solving SCCPs. To improve the their empirical performance, we design a novel transformation method to enable the SCCPs can be solved by Regret Matching+ (RM+), a no-regret algorithm with better empirical performance, resulting in Reward Transformation RM+ (RTRM+). RTRM+ enjoys last-iterate convergence under the discrete-time feedback setting. Using the counterfactual regret decomposition framework, we propose Reward Transformation CFR+ (RTCFR+) to extend RTRM+ to EFGs. Experimental results show that our algorithms significantly outperform existing last-iterate convergence algorithms and RM+ (CFR+).
Problem

Research questions and friction points this paper is trying to address.

Achieve last-iterate convergence in Counterfactual Regret Minimization (CFR) algorithms.
Improve empirical convergence rates using Regret Matching (RM)-based CFR algorithms.
Develop parameter-free algorithms for solving perturbed regularized extensive-form games (EFGs).
Innovation

Methods, ideas, or system contributions that make the work stand out.

CFR+ achieves last-iterate convergence in EFGs.
RTCFR+ leverages CFR+ for perturbed regularized EFGs.
Parameter-free RTCFR+ enhances stability and convergence.