🤖 AI Summary
To address the limitations of monolithic playing styles and degraded performance of offline learning under high stochasticity and out-of-distribution (OOD) states in Mahjong AI, this paper proposes a lightweight style-aware imitation learning (LfD) framework. We introduce explicit style preservation modeling into Mahjong LfD for the first time, jointly optimizing win rate and style fidelity via behavior cloning integrated with policy regularization—requiring only fine-tuning atop PPO. The method is trained end-to-end on multi-style expert trajectories. Evaluated on a standard Japanese Mahjong benchmark, it achieves a 12.7% win-rate improvement and 91.4% style similarity, significantly outperforming existing offline and LfD approaches. This breakthrough overcomes key learning bottlenecks posed by sparse rewards and OOD states.
📝 Abstract
A wide variety of bots in games enriches the gameplay experience and enhances replayability. Recent advancements in game artificial intelligence have predominantly focused on improving the proficiency of bots. Nevertheless, developing highly competent bots with a wide range of distinct play styles remains a relatively under-explored area. We select the Mahjong game environment as a case study. The high degree of randomness inherent in the Mahjong game and the prevalence of out-of-distribution states lead to suboptimal performance of existing offline learning and Learning-from-Demonstration (LfD) algorithms. In this paper, we leverage the gameplay histories of existing Mahjong agents and put forward a novel LfD algorithm that necessitates only minimal modifications to the Proximal Policy Optimization algorithm. The comprehensive empirical results illustrate that our proposed method not only significantly enhances the proficiency of the agents but also effectively preserves their unique play styles.