McMining: Automated Discovery of Misconceptions in Student Code

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Students commonly hold misconceptions about core programming concepts, impeding learning efficiency and code quality. This paper introduces the McMining task—the first systematic study of automatically mining such misconceptions from student code. Methodologically, we construct a scalable misconception benchmark dataset and design two LLM-driven mining frameworks: one leveraging Gemini and Claude for semantic parsing, and another using GPT-series models for pattern induction. Our contributions are threefold: (1) a formal definition and task formulation of “programming misconception mining”; (2) the first publicly released, structured misconception benchmark; and (3) an empirical analysis revealing the capabilities, limitations, and optimization pathways of state-of-the-art LLMs on this task. Experiments demonstrate that our approach effectively identifies prevalent cognitive biases, enabling novel paradigms for personalized programming feedback and pedagogical intervention.

Technology Category

Application Category

📝 Abstract

When learning to code, students often develop misconceptions about various programming language concepts. These can not only lead to bugs or inefficient code, but also slow down the learning of related concepts. In this paper, we introduce McMining, the task of mining programming misconceptions from samples of code from a student. To enable the training and evaluation of McMining systems, we develop an extensible benchmark dataset of misconceptions together with a large set of code samples where these misconceptions are manifested. We then introduce two LLM-based McMiner approaches and through extensive evaluations show that models from the Gemini, Claude, and GPT families are effective at discovering misconceptions in student code.

Problem

Research questions and friction points this paper is trying to address.

Automated discovery of programming misconceptions in student code

Creating benchmark dataset for training misconception mining systems

Evaluating LLM-based approaches for identifying student coding errors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated mining of programming misconceptions from code

Extensible benchmark dataset with manifested misconception samples

LLM-based approaches using Gemini, Claude, and GPT models

🔎 Similar Papers

No similar papers found.