Reasoning Trajectories for Socratic Debugging of Student Code: From Misconceptions to Contradictions and Updated Beliefs

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Students frequently commit coding errors due to conceptual misunderstandings in programming learning. Method: This paper proposes “Socratic Debugging”—a pedagogical approach that avoids direct error correction and instead induces cognitive conflict via guided reasoning traces (RTs), prompting students to autonomously identify and rectify flawed mental models. We formally define the RT generation task, construct the first debugging dataset with human-annotated reasoning paths, and introduce a synergistic framework integrating LLM-based RT generation with anchor-guided Socratic dialogue. Evaluation employs LLM-as-judge for scalable, automated assessment. Contribution/Results: Experiments show our method achieves 91% RT accuracy and 98.7% effective dialogue turn rate, significantly enhancing students’ self-debugging capability. The work establishes a novel, modelable, and evaluable paradigm for cognitive intervention in programming education.

Technology Category

Application Category

📝 Abstract

In Socratic debugging, instructors guide students towards identifying and fixing a bug on their own, instead of providing the bug fix directly. Most novice programmer bugs are caused by programming misconceptions, namely false beliefs about a programming concept. In this context, Socratic debugging can be formulated as a guided Reasoning Trajectory (RT) leading to a statement about the program behavior that contradicts the bug-causing misconception. Upon reaching this statement, the ensuing cognitive dissonance leads the student to first identify and then update their false belief. In this paper, we introduce the task of reasoning trajectory generation, together with a dataset of debugging problems manually annotated with RTs. We then describe LLM-based solutions for generating RTs and Socratic conversations that are anchored on them. A large-scale LLM-as-judge evaluation shows that frontier models can generate up to 91% correct reasoning trajectories and 98.7% valid conversation turns.

Problem

Research questions and friction points this paper is trying to address.

Identify programming misconceptions causing student bugs

Generate reasoning trajectories for Socratic debugging guidance

Develop LLM-based solutions for debugging conversations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generating reasoning trajectories for debugging

Using LLMs to create Socratic conversations

Anchoring dialogue on misconception contradictions

🔎 Similar Papers

No similar papers found.