Explaining Puzzle Solutions in Natural Language: An Exploratory Study on 6x6 Sudoku

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the capacity of mainstream large language models (LLMs)—including GPT-4, Claude-3, and Llama-3—to jointly solve 6×6 Sudoku puzzles and generate strategic, stepwise, human-interpretable natural language explanations, focusing on explainability rather than mere answer correctness. Method: We conduct zero-shot and few-shot prompting experiments, evaluating both solution accuracy and explanation quality via human assessment and logical consistency analysis. Contribution/Results: Only one model demonstrates baseline puzzle-solving capability; none reliably produce explanations reflecting heuristic strategies, incremental reasoning, or cognitive accessibility. To our knowledge, this is the first empirical study to rigorously assess explanation quality—specifically, strategic interpretability—in structured reasoning tasks. Our findings expose a fundamental limitation in current LLMs’ ability to articulate deliberate, pedagogically sound reasoning processes. The work establishes a novel evaluation benchmark for trustworthy human-AI collaborative decision-making, emphasizing transparency, strategy awareness, and explanatory fidelity over output correctness alone.

Technology Category

Application Category

📝 Abstract
The success of Large Language Models (LLMs) in human-AI collaborative decision-making hinges on their ability to provide trustworthy, gradual, and tailored explanations. Solving complex puzzles, such as Sudoku, offers a canonical example of this collaboration, where clear and customized explanations often hold greater importance than the final solution. In this study, we evaluate the performance of five LLMs in solving and explaining sixsix{} Sudoku puzzles. While one LLM demonstrates limited success in solving puzzles, none can explain the solution process in a manner that reflects strategic reasoning or intuitive problem-solving. These findings underscore significant challenges that must be addressed before LLMs can become effective partners in human-AI collaborative decision-making.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to solve and explain Sudoku puzzles
Assessing LLMs' strategic reasoning in puzzle solution explanations
Identifying challenges for LLMs in human-AI collaborative decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating five LLMs on Sudoku solving
Assessing explanation quality of LLMs
Identifying gaps in strategic reasoning
🔎 Similar Papers
No similar papers found.