🤖 AI Summary
To address the poor convergence robustness of predictive counterfactual regret minimization (PCFR⁺) in two-player zero-sum imperfect-information games—caused by prediction errors—the paper proposes asynchronous predictive CFR⁺ (APCFR⁺) and its simplified variant, SAPCFR⁺. Methodologically, APCFR⁺ introduces (1) an adaptive asynchronous update mechanism that decouples the update frequencies of implicit and explicit counterfactual regrets; and (2) a novel adaptive step-size policy that substantially improves stability under prediction inaccuracies. SAPCFR⁺ further achieves a constant worst-case regret bound—a theoretical improvement over PCFR⁺—with only a single-line code modification, ensuring both simplicity and optimality guarantees. Empirical results across multiple benchmark games demonstrate that both algorithms consistently outperform PCFR⁺; notably, SAPCFR⁺ matches APCFR⁺’s convergence speed while offering stronger theoretical guarantees.
📝 Abstract
Counterfactual Regret Minimization (CFR) algorithms are widely used to compute a Nash equilibrium (NE) in two-player zero-sum imperfect-information extensive-form games (IIGs). Among them, Predictive CFR$^+$ (PCFR$^+$) is particularly powerful, achieving an exceptionally fast empirical convergence rate via the prediction in many games. However, the empirical convergence rate of PCFR$^+$ would significantly degrade if the prediction is inaccurate, leading to unstable performance on certain IIGs. To enhance the robustness of PCFR$^+$, we propose a novel variant, Asynchronous PCFR$^+$ (APCFR$^+$), which employs an adaptive asynchronization of step-sizes between the updates of implicit and explicit accumulated counterfactual regrets to mitigate the impact of the prediction inaccuracy on convergence. We present a theoretical analysis demonstrating why APCFR$^+$ can enhance the robustness. Finally, we propose a simplified version of APCFR$^+$ called Simple APCFR$^+$ (SAPCFR$^+$), which uses a fixed asynchronization of step-sizes to simplify the implementation that only needs a single-line modification of the original PCFR+. Interestingly, SAPCFR$^+$ achieves a constant-factor lower theoretical regret bound than PCFR$^+$ in the worst case. Experimental results demonstrate that (i) both APCFR$^+$ and SAPCFR$^+$ outperform PCFR$^+$ in most of the tested games, as well as (ii) SAPCFR$^+$ achieves a comparable empirical convergence rate with APCFR$^+$.