🤖 AI Summary
This study addresses the lack of systematic empirical investigation into the prevalence, evolution, and interrelationships of community smells in open-source machine learning (ML) projects. We conduct the first longitudinal, quantitative analysis of six community smell types—leveraging the CADOCS tool to extract static socio-technical metrics and model temporal trends across 188 ML repositories in the NICHE dataset. Results reveal that certain smells—e.g., Prima Donna Effects and Sharing Villainy—are highly prevalent and exhibit substantial temporal volatility; furthermore, multiple strongly co-occurring smell patterns are identified. By bridging a critical empirical gap in ML engineering’s socio-technical dimension, this work provides actionable theoretical foundations and practical insights for diagnosing collaboration health and enabling evidence-based, collaborative governance in ML development communities.
📝 Abstract
Effective software development relies on managing both collaboration and technology, but sociotechnical challenges can harm team dynamics and increase technical debt. Although teams working on ML enabled systems are interdisciplinary, research has largely focused on technical issues, leaving their socio-technical dynamics underexplored. This study aims to address this gap by examining the prevalence, evolution, and interrelations of community smells, in open-source ML projects. We conducted an empirical study on 188 repositories from the NICHE dataset using the CADOCS tool to identify and analyze community smells. Our analysis focused on their prevalence, interrelations, and temporal variations. We found that certain smells, such as Prima Donna Effects and Sharing Villainy, are more prevalent and fluctuate over time compared to others like Radio Silence or Organizational Skirmish. These insights might provide valuable support for ML project managers in addressing socio-technical issues and improving team coordination.