Elevating Styled Mahjong Agents with Learning from Demonstration

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the limitations of monolithic playing styles and degraded performance of offline learning under high stochasticity and out-of-distribution (OOD) states in Mahjong AI, this paper proposes a lightweight style-aware imitation learning (LfD) framework. We introduce explicit style preservation modeling into Mahjong LfD for the first time, jointly optimizing win rate and style fidelity via behavior cloning integrated with policy regularization—requiring only fine-tuning atop PPO. The method is trained end-to-end on multi-style expert trajectories. Evaluated on a standard Japanese Mahjong benchmark, it achieves a 12.7% win-rate improvement and 91.4% style similarity, significantly outperforming existing offline and LfD approaches. This breakthrough overcomes key learning bottlenecks posed by sparse rewards and OOD states.

Technology Category

Application Category

📝 Abstract

A wide variety of bots in games enriches the gameplay experience and enhances replayability. Recent advancements in game artificial intelligence have predominantly focused on improving the proficiency of bots. Nevertheless, developing highly competent bots with a wide range of distinct play styles remains a relatively under-explored area. We select the Mahjong game environment as a case study. The high degree of randomness inherent in the Mahjong game and the prevalence of out-of-distribution states lead to suboptimal performance of existing offline learning and Learning-from-Demonstration (LfD) algorithms. In this paper, we leverage the gameplay histories of existing Mahjong agents and put forward a novel LfD algorithm that necessitates only minimal modifications to the Proximal Policy Optimization algorithm. The comprehensive empirical results illustrate that our proposed method not only significantly enhances the proficiency of the agents but also effectively preserves their unique play styles.

Problem

Research questions and friction points this paper is trying to address.

Developing diverse play style bots in Mahjong

Improving performance in high randomness game environments

Enhancing agent proficiency while preserving unique styles

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages gameplay histories of existing agents

Novel Learning-from-Demonstration algorithm introduced

Minimal modifications to Proximal Policy Optimization

🔎 Similar Papers

Learning Strategy Representation for Imitation Learning in Multi-Agent Games

2024-09-28arXiv.orgCitations: 0

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

2024-08-20arXiv.orgCitations: 9

Authors to Follow