Empowering LLMs in Decision Games through Algorithmic Data Synthesis

📅 2025-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the weak reasoning capabilities and poor generalization of large language models (LLMs) in complex decision-making games (e.g., Dou Dizhu, Go). We propose the first algorithmic data synthesis framework tailored for decision-making games. Our method integrates rule-guided trajectory generation, self-play sampling, offline trajectory distillation, and multi-stage instruction tuning, augmented by a reinforcement-based feedback alignment mechanism. Key contributions are: (1) a scalable, high-fidelity paradigm for synthesizing decision trajectories; and (2) a decision-enhanced training architecture balancing task-specific proficiency with transferable general reasoning. Experiments demonstrate that our model achieves competitive performance in both Dou Dizhu and Go, while exhibiting substantial improvements in logical reasoning, strategic planning, and other foundational cognitive skills—validating the positive transfer effect of decision-centric training on general reasoning abilities.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have exhibited impressive capabilities across numerous domains, yet they often struggle with complex reasoning and decision-making tasks. Decision-making games, which inherently require multifaceted reasoning logic, serve as ideal sandboxes for evaluating and enhancing the reasoning abilities of LLMs. In this work, we first explore whether LLMs can master complex decision-making games through targeted post-training. To this end, we design data synthesis strategies and curate extensive offline datasets from two classic games, Doudizhu and Go. We further develop a suite of techniques to effectively incorporate this data into LLM training, resulting in two novel agents: Mastermind-Dou and Mastermind-Go. Our experimental results demonstrate that these Mastermind LLMs achieve competitive performance in their respective games. Additionally, we explore whether integrating decision-making data can enhance the general reasoning abilities of LLMs. Our findings suggest that such post-training improves certain aspects of reasoning, providing valuable insights for optimizing LLM data collection and synthesis strategies.
Problem

Research questions and friction points this paper is trying to address.

Enhance LLMs' complex reasoning in decision games
Develop data synthesis for LLM post-training
Improve general reasoning via decision-making data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Algorithmic data synthesis for LLM training
Post-training LLMs with curated game datasets
Developing Mastermind agents for decision games
🔎 Similar Papers
No similar papers found.
Haolin Wang
Haolin Wang
Ph.D. Student. Georgia Institute of Technology
infrastructure monitoringasset managementAIMLcomputer vision
X
Xueyan Li
Shanghai Artificial Intelligence Laboratory
Y
Yazhe Niu
Shanghai Artificial Intelligence Laboratory, The Chinese University of Hong Kong
Shuai Hu
Shuai Hu
Siberian Branch of the Russian Academy of Sciences
ML、Psychology
H
Hongsheng Li
The Chinese University of Hong Kong