Explore the Reasoning Capability of LLMs in the Chess Testbed

📅 2024-11-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

To address the limited reasoning depth and consistency of large language models (LLMs) in long-horizon, complex reasoning tasks—exemplified by chess—this paper proposes a strategy-tactics dual-track linguistic explanation augmentation paradigm. Methodologically, we (1) construct MATE, the first million-scale expert-annotated chess dataset featuring fine-grained strategy intent and tactical pattern labels alongside natural-language explanations; (2) perform supervised fine-tuning and multi-stage prompt alignment on LLaMA-3-8B, jointly leveraging both structured labels and linguistic explanations; and (3) evaluate on move selection, where our model significantly outperforms commercial LLMs including GPT-4, Claude-3, and Gemini-1.5. This work provides the first empirical evidence that structured, linguistically grounded explanations substantially enhance LLMs’ long-range logical reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Reasoning is a central capability of human intelligence. In recent years, with the advent of large-scale datasets, pretrained large language models have emerged with new capabilities, including reasoning. However, these models still struggle with long-term, complex reasoning tasks, such as playing chess. Based on the observation that expert chess players employ a dual approach combining long-term strategic play with short-term tactical play along with language explanation, we propose improving the reasoning capability of large language models in chess by integrating annotated strategy and tactic. Specifically, we collect a dataset named MATE, which consists of 1 million chess positions with candidate moves annotated by chess experts for strategy and tactics. We finetune the LLaMA-3-8B model and compare it against state-of-the-art commercial language models in the task of selecting better chess moves. Our experiments show that our models perform better than GPT, Claude, and Gemini models. We find that language explanations can enhance the reasoning capability of large language models.

Problem

Research questions and friction points this paper is trying to address.

Enhance LLMs' reasoning in chess using annotated strategy and tactics.

Address LLMs' struggle with long-term, complex reasoning tasks like chess.

Improve move selection in chess by integrating expert-annotated data.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates annotated chess strategy and tactics

Finetunes LLaMA-3-8B with expert-annotated dataset

Enhances reasoning via language explanations

🔎 Similar Papers

No similar papers found.