Deep Reinforcement Learning Xiangqi Player with Monte Carlo Tree Search

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Chinese chess poses unique modeling challenges for deep reinforcement learning (DRL) due to its high branching factor, asymmetric piece dynamics, and culturally specific rules—including the “general-facing” prohibition, river-bound movement constraints, and pawn promotion upon crossing the river. Method: We propose a DRL–Monte Carlo Tree Search (MCTS) co-training paradigm tailored to cultural strategic games. Our approach employs a deep residual convolutional network with a shared policy-value head and integrates domain-informed action pruning and win-rate-guided backpropagation into MCTS. Contribution/Results: This work achieves the first end-to-end neural modeling of full Chinese chess rules. Empirical evaluation shows the agent attains professional shodan-level performance under standard rules, with >92% self-play win rate. Moreover, when transferred to rule variants, it demonstrates 40% higher sample efficiency, significantly improving generalization across game variants.

Technology Category

Application Category

📝 Abstract
This paper presents a Deep Reinforcement Learning (DRL) system for Xiangqi (Chinese Chess) that integrates neural networks with Monte Carlo Tree Search (MCTS) to enable strategic self-play and self-improvement. Addressing the underexplored complexity of Xiangqi, including its unique board layout, piece movement constraints, and victory conditions, our approach combines policy-value networks with MCTS to simulate move consequences and refine decision-making. By overcoming challenges such as Xiangqi's high branching factor and asymmetrical piece dynamics, our work advances AI capabilities in culturally significant strategy games while providing insights for adapting DRL-MCTS frameworks to domain-specific rule systems.
Problem

Research questions and friction points this paper is trying to address.

Develops DRL system for Xiangqi with MCTS integration
Addresses Xiangqi's unique complexity and movement constraints
Overcomes high branching factor and asymmetrical piece dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

DRL system integrates neural networks with MCTS
Policy-value networks combined with MCTS simulation
Adapts DRL-MCTS to Xiangqi's complex rules
🔎 Similar Papers
No similar papers found.
J
Junyu Hu
Department of Electrical Engineering, The Fu Foundation School of Engineering and Applied Science, Columbia University, New York, NY 10027, USA
Jinsong Liu
Jinsong Liu
Shanghai University of Finance and Economics
Operations ResearchReinforcement Learning
Berk Yilmaz
Berk Yilmaz
Columbia University
AIMachine Learning