🤖 AI Summary
This study addresses the automatic assessment of comprehension levels in novice programmers’ responses to “explain this code in simple English” tasks, focusing on distinguishing between multi-structural (line-by-line description) and relational (holistic functional explanation) cognitive patterns. We propose an LLM-driven zero-shot prompt segmentation method that uses paragraph count in student responses as a proxy for conceptual depth—enabling cognitive modeling without fine-tuning. Our approach integrates dual-track parsing of both source code and natural language explanations, and achieves high inter-annotator agreement with expert labels (Krippendorff’s α > 0.8). The resulting lightweight, open-source Python toolkit delivers real-time, interpretable formative feedback. To our knowledge, this is the first work to systematically leverage prompt segmentation behavior for cognitive classification in programming comprehension.
📝 Abstract
Reading and understanding code are fundamental skills for novice programmers, and especially important with the growing prevalence of AI-generated code and the need to evaluate its accuracy and reliability. ``Explain in Plain English'' questions are a widely used approach for assessing code comprehension, but providing automated feedback, particularly on comprehension levels, is a challenging task. This paper introduces a novel method for automatically assessing the comprehension level of responses to ``Explain in Plain English'' questions. Central to this is the ability to distinguish between two response types: multi-structural, where students describe the code line-by-line, and relational, where they explain the code's overall purpose. Using a Large Language Model (LLM) to segment both the student's description and the code, we aim to determine whether the student describes each line individually (many segments) or the code as a whole (fewer segments). We evaluate this approach's effectiveness by comparing segmentation results with human classifications, achieving substantial agreement. We conclude with how this approach, which we release as an open source Python package, could be used as a formative feedback mechanism.