🤖 AI Summary
This study investigates the association between community “smells” (e.g., radio silence, organizational silos) and self-admitted technical debt (SATD) in open-source machine learning (ML) projects. Leveraging multi-version release data from 155 ML projects, we propose a methodological framework integrating community smell detection, context-aware SATD identification, and hierarchical statistical analysis—including chi-square tests, temporal modeling, and cross-scale comparison. Our analysis reveals, for the first time, a strong positive correlation: authority- and communication-related smells significantly increase SATD incidence. We identify six SATD categories with distinct mappings to specific community smells. Furthermore, small-, medium-, and large-scale projects exhibit divergent co-evolutionary trajectories between smells and SATD; notably, code and design debt persist longer under communication breakdowns. These findings provide empirical, community-level evidence and actionable intervention points for SATD governance in ML projects.
📝 Abstract
Community smells reflect poor organizational practices that often lead to socio-technical issues and the accumulation of Self-Admitted Technical Debt (SATD). While prior studies have explored these problems in general software systems, their interplay in machine learning (ML)-based projects remains largely underexamined. In this study, we investigated the prevalence of community smells and their relationship with SATD in open-source ML projects, analyzing data at the release level. First, we examined the prevalence of ten community smell types across the releases of 155 ML-based systems and found that community smells are widespread, exhibiting distinct distribution patterns across small, medium, and large projects. Second, we detected SATD at the release level and applied statistical analysis to examine its correlation with community smells. Our results showed that certain smells, such as Radio Silence and Organizational Silos, are strongly correlated with higher SATD occurrences. Third, we considered the six identified types of SATD to determine which community smells are most associated with each debt category. Our analysis revealed authority- and communication-related smells often co-occur with persistent code and design debt. Finally, we analyzed how the community smells and SATD evolve over the releases, uncovering project size-dependent trends and shared trajectories. Our findings emphasize the importance of early detection and mitigation of socio-technical issues to maintain the long-term quality and sustainability of ML-based systems.