Elevating Styled Mahjong Agents with Learning from Demonstration

📅 2025-06-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of monolithic playing styles and degraded performance of offline learning under high stochasticity and out-of-distribution (OOD) states in Mahjong AI, this paper proposes a lightweight style-aware imitation learning (LfD) framework. We introduce explicit style preservation modeling into Mahjong LfD for the first time, jointly optimizing win rate and style fidelity via behavior cloning integrated with policy regularization—requiring only fine-tuning atop PPO. The method is trained end-to-end on multi-style expert trajectories. Evaluated on a standard Japanese Mahjong benchmark, it achieves a 12.7% win-rate improvement and 91.4% style similarity, significantly outperforming existing offline and LfD approaches. This breakthrough overcomes key learning bottlenecks posed by sparse rewards and OOD states.

Technology Category

Application Category

📝 Abstract
A wide variety of bots in games enriches the gameplay experience and enhances replayability. Recent advancements in game artificial intelligence have predominantly focused on improving the proficiency of bots. Nevertheless, developing highly competent bots with a wide range of distinct play styles remains a relatively under-explored area. We select the Mahjong game environment as a case study. The high degree of randomness inherent in the Mahjong game and the prevalence of out-of-distribution states lead to suboptimal performance of existing offline learning and Learning-from-Demonstration (LfD) algorithms. In this paper, we leverage the gameplay histories of existing Mahjong agents and put forward a novel LfD algorithm that necessitates only minimal modifications to the Proximal Policy Optimization algorithm. The comprehensive empirical results illustrate that our proposed method not only significantly enhances the proficiency of the agents but also effectively preserves their unique play styles.
Problem

Research questions and friction points this paper is trying to address.

Developing diverse play style bots in Mahjong
Improving performance in high randomness game environments
Enhancing agent proficiency while preserving unique styles
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages gameplay histories of existing agents
Novel Learning-from-Demonstration algorithm introduced
Minimal modifications to Proximal Policy Optimization
Lingfeng Li
Lingfeng Li
HONG KONG CENTRE FOR CEREBRO-CARDIOVASCULAR HEALTH ENGINEERING
Y
Yunlong Lu
Department of Computer Science, Peking University
Y
Yongyi Wang
Department of Computer Science, Peking University
W
Wenxin Li
Department of Computer Science, Peking University