Deep (Predictive) Discounted Counterfactual Regret Minimization

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional Counterfactual Regret Minimization (CFR) methods suffer from high computational overhead and poor scalability in large imperfect-information games. Method: This paper proposes the first general deep neural network framework capable of end-to-end neural simulation of multiple advanced CFR variants (e.g., DCFR, LCFR). It introduces a value-network-guided advantage sampling scheme and cumulative advantage fitting, integrated with discounted updates, gradient clipping, variance-reduced Monte Carlo sampling, and bootstrapped advantage estimation to enhance training stability and generalization. Results: Experiments demonstrate faster convergence on standard imperfect-information benchmarks and superior performance over existing model-free methods on large-scale poker tasks. The framework establishes a scalable, robust paradigm for learning in massive imperfect-information games, enabling efficient approximation of equilibrium strategies without explicit tree traversal or hand-crafted regret updates.

Technology Category

Application Category

📝 Abstract
Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. To enhance CFR's applicability in large games, researchers use neural networks to approximate its behavior. However, existing methods are mainly based on vanilla CFR and struggle to effectively integrate more advanced CFR variants. In this work, we propose an efficient model-free neural CFR algorithm, overcoming the limitations of existing methods in approximating advanced CFR variants. At each iteration, it collects variance-reduced sampled advantages based on a value network, fits cumulative advantages by bootstrapping, and applies discounting and clipping operations to simulate the update mechanisms of advanced CFR variants. Experimental results show that, compared with model-free neural algorithms, it exhibits faster convergence in typical imperfect-information games and demonstrates stronger adversarial performance in a large poker game.
Problem

Research questions and friction points this paper is trying to address.

Approximating advanced CFR variants in large imperfect-information games
Overcoming limitations of neural network integration with CFR
Enhancing convergence speed and adversarial performance in poker
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-free neural CFR algorithm approximates advanced variants
Uses value network for variance-reduced advantage sampling
Applies discounting and clipping to simulate CFR updates
🔎 Similar Papers
No similar papers found.
H
Hang Xu
C2DL, Institute of Automation, Chinese Academy of Sciences
K
Kai Li
C2DL, Institute of Automation, Chinese Academy of Sciences
Haobo Fu
Haobo Fu
Tencent AI Lab, University of Birmingham
Reinforcement LearningEvolutionary Computation
Q
Qiang Fu
Tencent AI Lab
J
Junliang Xing
Tsinghua University
J
Jian Cheng
C2DL, Institute of Automation, Chinese Academy of Sciences