🤖 AI Summary
This study addresses the lack of systematic understanding regarding how project characteristics influence machine learning (ML)-specific code smells versus general Python code smells in ML projects, a gap that limits the generalizability of quality assurance strategies. Analyzing 279 open-source ML projects, the authors examine six project attributes—size, age, number of contributors, commit frequency, CI/CD adoption, and application domain—and employ CodeSmell and Pylint to detect ML-specific and general code smells, respectively. Statistical analysis reveals that ML-specific smells are significantly less prevalent than general ones; only commit frequency and application domain notably affect ML code quality. Domains such as MLOps, reinforcement learning, and computer vision exhibit distinct ML-specific smells, whereas general code smells show no significant association with project features. These findings challenge conventional technical debt assumptions and underscore the need for smell-type-specific quality strategies and domain-aware quality gates in CI/CD pipelines.
📝 Abstract
Machine learning systems consist of general-purpose code as well as machine-learning-specific code. While ML-specific code smells have been identified, their connection to project characteristics and their interaction with overall code quality are not well understood. Without this knowledge, quality assurance strategies remain one-size-fits-all, failing to account for the contextual factors that drive technical debt in ML systems. We present empirical evidence by examining how six project features (size, age, contributors, commit frequency, CI/CD adoption, and domain) relate to both ML-specific and general Python code quality in 279 open-source ML projects on GitHub. Using CodeSmile for ML code smells and Pylint for general Python smells, our results show: (1) ML code smells are 41-94 times less frequent than general Python smells; (2) commit frequency and domain are significantly associated with ML-specific quality, while project size, team size, age, and CI/CD adoption are not, challenging traditional views on technical debt; (3) general Python smells are not linked to any project characteristic, indicating systemic coding issues that are independent of project context; (4) domains that suffer most from ML-specific smells are not necessarily the same domains that suffer most from general Python smells, necessitating tailored quality strategies for each smell type. MLOps often involves configuration issues, Reinforcement Learning faces challenges with tensor manipulation, and Computer Vision encounters problems with GPU workflows. Overall, ML code quality depends on domain-specific practices and specialized CI/CD quality gates, as standard automation often overlooks domain-specific correctness problems.