🤖 AI Summary
This study addresses the early automated identification of security vulnerability reports (SBRs). We systematically compare BERT and Random Forest (RF) across intra-project, cross-project, and mixed-data scenarios, employing G-measure for evaluation and establishing a standardized transfer learning experimental framework. Key contributions: (1) RF significantly outperforms BERT in intra-project prediction (average G-measure 34% higher); (2) BERT demonstrates superior robustness in cross-project (62% G-measure) and mixed-data (66%) settings, whereas RF degrades sharply to 46%; (3) we first reveal that incorporating cross-project SBRs jointly improves both models’ performance, while mixing in non-security defects harms RF yet benefits BERT. Our findings empirically delineate the applicability boundaries of each model, providing actionable guidance for model selection and dataset construction in SBR detection tasks.
📝 Abstract
Early detection of security bug reports (SBRs) is crucial for preventing vulnerabilities and ensuring system reliability. While machine learning models have been developed for SBR prediction, their predictive performance still has room for improvement. In this study, we conduct a comprehensive comparison between BERT and Random Forest (RF), a competitive baseline for predicting SBRs. The results show that RF outperforms BERT with a 34% higher average G-measure for within-project predictions. Adding only SBRs from various projects improves both models' average performance. However, including both security and nonsecurity bug reports significantly reduces RF's average performance to 46%, while boosts BERT to its best average performance of 66%, surpassing RF. In cross-project SBR prediction, BERT achieves a remarkable 62% G-measure, which is substantially higher than RF.