InvAASTCluster: On Applying Invariant-Based Program Clustering to Introductory Programming Assignments

📅 2022-06-28
🏛️ arXiv.org
📈 Citations: 7
Influential: 0
📄 PDF
🤖 AI Summary
Traditional syntactic clustering methods—based on abstract syntax trees (ASTs), control-flow graphs, or data-flow analyses—struggle to accurately identify functional equivalence among the large volume of syntactically diverse yet semantically equivalent correct programs submitted by students in introductory programming assignments (IPAs), thereby limiting automated repair efficiency. To address this, we propose a semantics-aware program clustering method that jointly leverages dynamically mined program invariants and anonymized abstract syntax trees (AASTs). Specifically, we introduce the first joint encoding of runtime-invariant patterns and structural AAST representations to construct semantic-robust program embeddings. These embeddings underpin a clustering-driven automated repair framework. Experimental evaluation demonstrates that our method significantly outperforms purely syntactic representations in clustering semantically equivalent correct programs. When integrated into state-of-the-art repair tools, it improves repair success rates by approximately 13% while reducing average repair time.
📝 Abstract
Due to the vast number of students enrolled in programming courses, there has been an increasing number of automated program repair techniques focused on introductory programming assignments (IPAs). Typically, such techniques use program clustering to take advantage of previous correct student implementations to repair a new incorrect submission. These repair techniques use clustering methods since analyzing all available correct submissions to repair a program is not feasible. However, conventional clustering methods rely on program representations based on features such as abstract syntax trees (ASTs), syntax, control flow, and data flow. This paper proposes InvAASTCluster, a novel approach for program clustering that uses dynamically generated program invariants to cluster semantically equivalent IPAs. InvAASTCluster's program representation uses a combination of the program's semantics, through its invariants, and its structure through its anonymized abstract syntax tree (AASTs). Invariants denote conditions that must remain true during program execution, while AASTs are ASTs devoid of variable and function names, retaining only their types. Our experiments show that the proposed program representation outperforms syntax-based representations when clustering a set of correct IPAs. Furthermore, we integrate InvAASTCluster into a state-of-the-art clustering-based program repair tool. Our results show that InvAASTCluster advances the current state-of-the-art when used by clustering-based repair tools by repairing around 13% more students' programs, in a shorter amount of time.
Problem

Research questions and friction points this paper is trying to address.

Clustering introductory programming assignments using invariant-based semantics
Improving program repair by leveraging dynamic invariants and anonymized ASTs
Enhancing correctness and efficiency of automated IPA repair tools
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses dynamically generated program invariants for clustering
Combines program semantics and anonymized AST structure
Improves repair tool efficiency and success rate
🔎 Similar Papers
No similar papers found.
Pedro Orvalho
Pedro Orvalho
University of Oxford
Automated ReasoningAutomated VerificationProgram RepairComputer-aided Education
Mikoláš Janota
Mikoláš Janota
CTU Prague
SMTMachine learningQuantifiersFormal Methods
V
Vasco M. Manquinho
INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol 9, Lisboa, 1000-029, Portugal