🤖 AI Summary
This study addresses identifier name similarity-induced naming confusion and its adverse effects on code comprehension, maintainability, and developer collaboration. To address the lack of a systematic classification framework in prior work, we propose the first taxonomy of identifier name similarity, spanning semantic, orthographic, and contextual dimensions—designed for both theoretical rigor and practical scalability. Through empirical analysis of naming patterns across large-scale open-source projects, we identify six high-frequency similarity categories (e.g., spelling variants, abbreviation conflicts, semantic near-synonyms) and empirically validate their prevalence and detrimental impact in real-world codebases. The taxonomy provides a reusable theoretical foundation and methodological support for identifier naming quality assessment, static analysis tool design, and collaborative naming convention development.
📝 Abstract
Identifier names, which comprise a significant portion of the codebase, are the cornerstone of effective program comprehension. However, research has shown that poorly chosen names can significantly increase cognitive load and hinder collaboration. Even names that appear readable in isolation may lead to misunderstandings in contexts when they closely resemble other names in either structure or functionality. In this exploratory study, we present our preliminary findings on the occurrence of identifier name similarity in software projects through the development of a taxonomy that categorizes different forms of identifier name similarity. We envision our initial taxonomy providing researchers with a platform to analyze and evaluate the impact of identifier name similarity on code comprehension, maintainability, and collaboration among developers, while also allowing for further refinement and expansion of the taxonomy.