🤖 AI Summary
This work addresses the confounding of genuine development effort with administrative delays in software defect repair time prediction. Methodologically, it introduces the first project-specific, interpretable prediction approach that explicitly decouples development time from managerial overhead. It integrates LDA-based semantic feature extraction from issue descriptions with heterogeneous metadata—including priority, component, and assignee—to construct a rich, multi-dimensional feature space. An ensemble regression model is employed for project-level personalized forecasting, augmented with an embedded interpretability framework to support transparent decision-making. Experimental evaluation across multiple open-source projects demonstrates substantial improvements in prediction accuracy, reducing mean absolute error (MAE) by 18.7%–32.4% over baselines. Moreover, the method enables actionable insights for task assignment optimization and defect prioritization, thereby enhancing both predictive reliability and operational utility.
📝 Abstract
Lately, software development has become a predominantly online process, as more teams host and monitor their projects remotely. Sophisticated approaches employ issue tracking systems like Jira, predicting the time required to resolve issues and effectively assigning and prioritizing project tasks. Several methods have been developed to address this challenge, widely known as bug-fix time prediction, yet they exhibit significant limitations. Most consider only textual issue data and/or use techniques that overlook the semantics and metadata of issues (e.g., priority or assignee expertise). Many also fail to distinguish actual development effort from administrative delays, including assignment and review phases, leading to estimates that do not reflect the true effort needed. In this work, we build an issue monitoring system that extracts the actual effort required to fix issues on a per-project basis. Our approach employs topic modeling to capture issue semantics and leverages metadata (components, labels, priority, issue type, assignees) for interpretable resolution time analysis. Final predictions are generated by an aggregated model, enabling contributors to make informed decisions. Evaluation across multiple projects shows the system can effectively estimate resolution time and provide valuable insights.