Deep Reinforcement Learning Xiangqi Player with Monte Carlo Tree Search

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Chinese chess poses unique modeling challenges for deep reinforcement learning (DRL) due to its high branching factor, asymmetric piece dynamics, and culturally specific rules—including the “general-facing” prohibition, river-bound movement constraints, and pawn promotion upon crossing the river. Method: We propose a DRL–Monte Carlo Tree Search (MCTS) co-training paradigm tailored to cultural strategic games. Our approach employs a deep residual convolutional network with a shared policy-value head and integrates domain-informed action pruning and win-rate-guided backpropagation into MCTS. Contribution/Results: This work achieves the first end-to-end neural modeling of full Chinese chess rules. Empirical evaluation shows the agent attains professional shodan-level performance under standard rules, with >92% self-play win rate. Moreover, when transferred to rule variants, it demonstrates 40% higher sample efficiency, significantly improving generalization across game variants.

Technology Category

Application Category

📝 Abstract

This paper presents a Deep Reinforcement Learning (DRL) system for Xiangqi (Chinese Chess) that integrates neural networks with Monte Carlo Tree Search (MCTS) to enable strategic self-play and self-improvement. Addressing the underexplored complexity of Xiangqi, including its unique board layout, piece movement constraints, and victory conditions, our approach combines policy-value networks with MCTS to simulate move consequences and refine decision-making. By overcoming challenges such as Xiangqi's high branching factor and asymmetrical piece dynamics, our work advances AI capabilities in culturally significant strategy games while providing insights for adapting DRL-MCTS frameworks to domain-specific rule systems.

Problem

Research questions and friction points this paper is trying to address.

Develops DRL system for Xiangqi with MCTS integration

Addresses Xiangqi's unique complexity and movement constraints

Overcomes high branching factor and asymmetrical piece dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

DRL system integrates neural networks with MCTS

Policy-value networks combined with MCTS simulation

Adapts DRL-MCTS to Xiangqi's complex rules

🔎 Similar Papers

No similar papers found.

Authors to Follow