🤖 AI Summary
In online mathematics learning, students lack immediate, personalized instructional feedback on problem-solving. To address this, we introduce MathEDU—the first dataset featuring authentic teacher-student interaction annotations—and propose a dual-modal adaptive feedback framework: one module accurately assesses answer correctness, while the other generates pedagogically grounded, fine-grained stepwise feedback. Our approach integrates large language model (LLM) fine-tuning, educational cognitive modeling, sequential problem-solving representation, and structured teacher-feedback annotation. Experiments demonstrate that the fine-tuned model achieves state-of-the-art performance in correctness assessment; even under cold-start conditions, it generates coherent, educationally appropriate feedback, significantly outperforming baselines. Key contributions include: (1) the first publicly available dataset with real-world classroom interaction annotations; (2) the first dual-modal feedback paradigm jointly optimizing diagnostic accuracy and pedagogical utility; and (3) an interpretable, evolvable feedback infrastructure for intelligent mathematics education.
📝 Abstract
Online learning enhances educational accessibility, offering students the flexibility to learn anytime, anywhere. However, a key limitation is the lack of immediate, personalized feedback, particularly in helping students correct errors in math problem-solving. Several studies have investigated the applications of large language models (LLMs) in educational contexts. In this paper, we explore the capabilities of LLMs to assess students' math problem-solving processes and provide adaptive feedback. The MathEDU dataset is introduced, comprising authentic student solutions annotated with teacher feedback. We evaluate the model's ability to support personalized learning in two scenarios: one where the model has access to students' prior answer histories, and another simulating a cold-start context. Experimental results show that the fine-tuned model performs well in identifying correctness. However, the model still faces challenges in generating detailed feedback for pedagogical purposes.