ReDefining Code Comprehension: Function Naming as a Mechanism for Evaluating Code Comprehension

📅 2025-03-15

📈 Citations: 0

✨ Influential: 0

career value

120K/year

🤖 AI Summary

Existing code comprehension assessments struggle to disentangle high-level functional intent understanding from low-level implementation details. Method: This study proposes function naming (FN) as a novel assessment paradigm—replacing traditional English-in-Programming-Explanation (EiPE) tasks—to specifically evaluate students’ grasp of code functionality. It is the first to apply Item Response Theory (IRT) to FN, rigorously establishing its reliability and validity. We develop an open-source, scalable Python auto-grading toolkit integrating large language model–assisted scoring with unit-test–based verification. Results: Evaluated in authentic introductory programming courses, FN achieves strong agreement with human EiPE scoring (Spearman ρ = 0.89), effectively discriminates among varying comprehension levels, and enables large-scale, objective, fine-grained assessment of code understanding proficiency.

Technology Category

Application Category

📝 Abstract

"Explain in Plain English"(EiPE) questions are widely used to assess code comprehension skills but are challenging to grade automatically. Recent approaches like Code Generation Based Grading (CGBG) leverage large language models (LLMs) to generate code from student explanations and validate its equivalence to the original code using unit tests. However, this approach does not differentiate between high-level, purpose-focused responses and low-level, implementation-focused ones, limiting its effectiveness in assessing comprehension level. We propose a modified approach where students generate function names, emphasizing the function's purpose over implementation details. We evaluate this method in an introductory programming course and analyze it using Item Response Theory (IRT) to understand its effectiveness as exam items and its alignment with traditional EiPE grading standards. We also publish this work as an open source Python package for autograding EiPE questions, providing a scalable solution for adoption.

Problem

Research questions and friction points this paper is trying to address.

Automatically grading code comprehension assessments is challenging.

Current methods fail to distinguish high-level from low-level responses.

Proposing function naming to assess comprehension level effectively.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Function naming for code comprehension assessment

Item Response Theory for method evaluation

Open source Python package for autograding

🔎 Similar Papers

Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates