Efficient Last-iterate Convergence Algorithms in Solving Games

πŸ“… 2023-08-22
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 3
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the lack of last-iterate convergence guarantees for Counterfactual Regret Minimization (CFR)-type algorithms in extensive-form games (EFGs). We propose a novel Reward Transformation (RT) framework that reformulates Nash equilibrium computation as a sequence of strongly convex-concave optimization subproblems. For the first time, we establish the theoretical foundation of RT in discrete time. Building upon this, we design RTRM+ and RTCFR+β€”the first CFR variants achieving strict last-iterate convergence in discrete time without requiring uniqueness of the Nash equilibrium or regularization-based perturbations. Experiments demonstrate that our methods significantly outperform OGDA, OMWU, and CFR+ on both normal-form games (NFGs) and EFGs, exhibiting faster convergence, enhanced stability, and parameter independence.
πŸ“ Abstract
No-regret algorithms are popular for learning Nash equilibrium (NE) in two-player zero-sum normal-form games (NFGs) and extensive-form games (EFGs). Many recent works consider the last-iterate convergence no-regret algorithms. Among them, the two most famous algorithms are Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weight Update (OMWU). However, OGDA has high per-iteration complexity. OMWU exhibits a lower per-iteration complexity but poorer empirical performance, and its convergence holds only when NE is unique. Recent works propose a Reward Transformation (RT) framework for MWU, which removes the uniqueness condition and achieves competitive performance with OMWU. Unfortunately, RT-based algorithms perform worse than OGDA under the same number of iterations, and their convergence guarantee is based on the continuous-time feedback assumption, which does not hold in most scenarios. To address these issues, we provide a closer analysis of the RT framework, which holds for both continuous and discrete-time feedback. We demonstrate that the essence of the RT framework is to transform the problem of learning NE in the original game into a series of strongly convex-concave optimization problems (SCCPs). We show that the bottleneck of RT-based algorithms is the speed of solving SCCPs. To improve the their empirical performance, we design a novel transformation method to enable the SCCPs can be solved by Regret Matching+ (RM+), a no-regret algorithm with better empirical performance, resulting in Reward Transformation RM+ (RTRM+). RTRM+ enjoys last-iterate convergence under the discrete-time feedback setting. Using the counterfactual regret decomposition framework, we propose Reward Transformation CFR+ (RTCFR+) to extend RTRM+ to EFGs. Experimental results show that our algorithms significantly outperform existing last-iterate convergence algorithms and RM+ (CFR+).
Problem

Research questions and friction points this paper is trying to address.

Achieve last-iterate convergence in Counterfactual Regret Minimization (CFR) algorithms.
Improve empirical convergence rates using Regret Matching (RM)-based CFR algorithms.
Develop parameter-free algorithms for solving perturbed regularized extensive-form games (EFGs).
Innovation

Methods, ideas, or system contributions that make the work stand out.

CFR+ achieves last-iterate convergence in EFGs.
RTCFR+ leverages CFR+ for perturbed regularized EFGs.
Parameter-free RTCFR+ enhances stability and convergence.
πŸ”Ž Similar Papers
No similar papers found.
Lin Meng
Lin Meng
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Zhenxing Ge
Zhenxing Ge
Nanjing University
W
Wenbin Li
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
B
Bo An
School of Computer Science and Engineering, Nanyang Technological University, Singapore
Y
Yang Gao
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China