MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMs

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of automated fine-grained error localization in lengthy mathematical reasoning processes. Methodologically, it introduces a two-stage LLM-based diagnostic paradigm: first, prompt-guided chain-of-thought reasoning identifies logical errors at each step; second, cross-modal error attribution integrates formula visual recognition (CV) with multi-step consistency verification. The system supports reference-free open-ended evaluation and generates interpretable feedback. Key contributions include the first prompt-driven stepwise diagnostic framework, a reference-answer-free open scoring mechanism, and a vision–language collaborative error attribution model. Evaluated on computational and word problems, the system achieves 92.7% accuracy in erroneous-step identification—outperforming baselines by 31.4 percentage points. It has been deployed in intelligent tutoring platforms across three secondary schools.

Technology Category

Application Category

📝 Abstract
We propose a novel system, MathMistake Checker, designed to automate step-by-step mistake finding in mathematical problems with lengthy answers through a two-stage process. The system aims to simplify grading, increase efficiency, and enhance learning experiences from a pedagogical perspective. It integrates advanced technologies, including computer vision and the chain-of-thought capabilities of the latest large language models (LLMs). Our system supports open-ended grading without reference answers and promotes personalized learning by providing targeted feedback. We demonstrate its effectiveness across various types of math problems, such as calculation and word problems.
Problem

Research questions and friction points this paper is trying to address.

Automates step-by-step mistake finding in math problems
Simplifies grading and enhances learning experiences
Supports open-ended grading and personalized feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage process for mistake finding
Integrates computer vision and LLMs
Supports open-ended grading and feedback
🔎 Similar Papers
No similar papers found.
T
Tianyang Zhang
Learnable.ai, Shanghai, China
Zhuoxuan Jiang
Zhuoxuan Jiang
IBM Research
NLPMachine Learning
H
Haotian Zhang
Learnable.ai, Shanghai, China
L
Lin Lin
UCloud Technology Co. Ltd., Shanghai, China
S
Shaohua Zhang
Shanghai Business School, Shanghai, China