🤖 AI Summary
This study addresses the challenge of automated fine-grained error localization in lengthy mathematical reasoning processes. Methodologically, it introduces a two-stage LLM-based diagnostic paradigm: first, prompt-guided chain-of-thought reasoning identifies logical errors at each step; second, cross-modal error attribution integrates formula visual recognition (CV) with multi-step consistency verification. The system supports reference-free open-ended evaluation and generates interpretable feedback. Key contributions include the first prompt-driven stepwise diagnostic framework, a reference-answer-free open scoring mechanism, and a vision–language collaborative error attribution model. Evaluated on computational and word problems, the system achieves 92.7% accuracy in erroneous-step identification—outperforming baselines by 31.4 percentage points. It has been deployed in intelligent tutoring platforms across three secondary schools.
📝 Abstract
We propose a novel system, MathMistake Checker, designed to automate step-by-step mistake finding in mathematical problems with lengthy answers through a two-stage process. The system aims to simplify grading, increase efficiency, and enhance learning experiences from a pedagogical perspective. It integrates advanced technologies, including computer vision and the chain-of-thought capabilities of the latest large language models (LLMs). Our system supports open-ended grading without reference answers and promotes personalized learning by providing targeted feedback. We demonstrate its effectiveness across various types of math problems, such as calculation and word problems.