🤖 AI Summary
This work addresses the emerging risk of novel code smells introduced by the improper use of large language models (LLMs) in software development, highlighting the urgent need for systematic identification and standardization. The study presents the first taxonomy of LLM-specific code smells, encompassing nine distinct categories, and introduces SpecDetect4LLM—a static analysis tool that combines rule-based and pattern-matching techniques to automatically detect these issues. Evaluation on 692 open-source projects reveals that 73.5% contain such LLM-induced code smells. The tool achieves a precision of 91.3% and a recall of 71.8%, effectively bridging the gap in both normative guidelines and practical tooling for ensuring code quality in the era of LLM-assisted programming.
📝 Abstract
Large Language Models (LLMs) are increasingly integrated into software systems for diverse purposes, due to their versatility, flexibility, and ability to simulate human reasoning to some extent. However, poor integration of LLM inference in source code can undermine software system quality. Therefore, inadequate LLM integration coding practices must be documented to help developers mitigate such issues. Following our earlier work on LLM code smells, this paper consolidates and refines the concept by presenting a self-contained taxonomy and a catalog of nine LLM code smells. We also create SpecDetect4LLM, a static source code analysis tool for their detection, and conduct extensive empirical evaluations of its detection effectiveness (precision and recall) as well as the prevalence of LLM code smells across 692 open-source software projects (171,194 source files). Our results show that LLM code smells affect 73.5% of the analyzed systems, with a detection precision of 91.3% and a recall of 71.8%.