Automatic Identification of Machine Learning-Specific Code Smells

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

Existing research lacks systematic, empirically validated tools for identifying machine learning (ML)-specific code smells. Method: This paper introduces the first taxonomy of ML code smells, developed via design science research—integrating literature synthesis, domain expert consultation, and static code analysis—and implements it in MLpylint, an open-source tool that automatically detects ML-specific issues such as data leakage, hardcoded hyperparameters, and lack of model reproducibility. Contribution/Results: Empirical evaluation across 160 GitHub-hosted ML projects demonstrates an 89.3% detection accuracy; expert surveys confirm high practical utility and engineering applicability. This work bridges a critical gap in ML software engineering by enabling automated code quality assurance, thereby enhancing the maintainability and reliability of ML systems.

Technology Category

Application Category

📝 Abstract

Machine learning (ML) has rapidly grown in popularity, becoming vital to many industries. Currently, the research on code smells in ML applications lacks tools and studies that address the identification and validity of ML-specific code smells. This work investigates suitable methods and tools to design and develop a static code analysis tool (MLpylint) based on code smell criteria. This research employed the Design Science Methodology. In the problem identification phase, a literature review was conducted to identify ML-specific code smells. In solution design, a secondary literature review and consultations with experts were performed to select methods and tools for implementing the tool. We evaluated the tool on data from 160 open-source ML applications sourced from GitHub. We also conducted a static validation through an expert survey involving 15 ML professionals. The results indicate the effectiveness and usefulness of the MLpylint. We aim to extend our current approach by investigating ways to introduce MLpylint seamlessly into development workflows, fostering a more productive and innovative developer environment.

Problem

Research questions and friction points this paper is trying to address.

Identify ML-specific code smells lacking tools

Develop static analysis tool for ML code

Validate tool effectiveness with experts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed MLpylint for static code analysis

Used expert surveys for tool validation

Evaluated tool on 160 open-source ML applications

🔎 Similar Papers

No similar papers found.