Cache-Related Smells in GitLab CI/CD: Comprehensive Catalog, Automated Detection, and Empirical Evidence

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

160K/year
🤖 AI Summary
This study addresses the widespread yet underrecognized issue of cache-related code smells in GitLab CI/CD pipelines, which significantly degrade pipeline performance and reliability. The authors present the first systematic characterization of ten distinct cache smells and introduce CROSSER, a novel automated detection tool that combines static analysis with rule-based matching. The effectiveness of CROSSER is rigorously evaluated through grey literature validation and a large-scale empirical study across 228 open-source projects, revealing that 89% of these projects exhibit at least one type of cache smell. When applied to 82 projects, CROSSER achieves an F1 score of 0.98, demonstrating substantial improvements in both detection accuracy and scalability over existing approaches.

Technology Category

Application Category

📝 Abstract
Continuous Integration and Deployment (CI/CD) facilitate rapid software delivery, making fast feedback and minimal downtime essential. While caching has been shown to be an effective technique for tackling pipeline performance and reliability issues, existing works have primarily focused on missing dependency caches, ignoring other types of caches and cache misconfigurations. In this paper, we present a comprehensive catalog of ten cache-related smells in GitLab CI/CD that negatively impact performance and reliability, validated on a corpus of grey literature. To address the smells, we propose CROSSER, a tool that automatically detects seven of the ten smells. We evaluate CROSSER on a manually labeled dataset of 82 mature projects, achieving an overall F1 score of 0.98. Finally, we investigate the presence of smells across a large dataset of 228 mature open-source projects and outline our empirical findings. Our results show a widespread frequency of the smells, as only 11% of the projects do not present any. We also show that developers may not be aware of higher-level caching functionalities.
Problem

Research questions and friction points this paper is trying to address.

cache-related smells
CI/CD
GitLab
performance
reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

cache-related smells
CI/CD
automated detection
GitLab
empirical study