"I wasn't sure if this is indeed a security risk": Data-driven Understanding of Security Issue Reporting in GitHub Repositories of Open Source npm Packages

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study reveals a critical labeling gap in the npm ecosystem: among 10.9 million GitHub Issues across 45,466 packages, only 0.13% are explicitly tagged as security-related, despite widespread implicit security concerns. Method: Leveraging a hybrid approach combining manual annotation with BERT and TF-IDF models, we identify 1.618 million latent security issues and 4.462 million associated security-relevant comments. We further conduct correlation analysis and interaction behavior mining to characterize developer responsiveness and automated tool efficacy. Contribution/Results: We empirically demonstrate—on the largest scale to date—that missing security labels create substantial governance blind spots, particularly for CVE-unassigned issues, which exhibit significantly delayed developer response; existing security bots show low detection rates. We propose an intelligent labeling framework and a human-in-the-loop response mechanism, and publicly release a reproducible dataset and models to advance evidence-based, scalable open-source supply chain security governance.

Technology Category

Application Category

📝 Abstract

The npm (Node Package Manager) ecosystem is the most important package manager for JavaScript development with millions of users. Consequently, a plethora of earlier work investigated how vulnerability reporting, patch propagation, and in general detection as well as resolution of security issues in such ecosystems can be facilitated. However, understanding the ground reality of security-related issue reporting by users (and bots) in npm-along with the associated challenges has been relatively less explored at scale. In this work, we bridge this gap by collecting 10,907,467 issues reported across GitHub repositories of 45,466 diverse npm packages. We found that the tags associated with these issues indicate the existence of only 0.13% security-related issues. However, our approach of manual analysis followed by developing high accuracy machine learning models identify 1,617,738 security-related issues which are not tagged as security-related (14.8% of all issues) as well as 4,461,934 comments made on these issues. We found that the bots which are in wide use today might not be sufficient for either detecting or offering assistance. Furthermore, our analysis of user-developer interaction data hints that many user-reported security issues might not be addressed by developers-they are not tagged as security-related issues and might be closed without valid justification. Consequently, a correlation analysis hints that the developers quickly handle security issues with known solutions (e.g., corresponding to CVE). However, security issues without such known solutions (even with reproducible code) might not be resolved. Our findings offer actionable insights for improving security management in open-source ecosystems, highlighting the need for smarter tools and better collaboration. The data and code for this work is available at https://doi.org/10.5281/zenodo.15614029

Problem

Research questions and friction points this paper is trying to address.

Understanding security issue reporting in npm GitHub repositories

Identifying untagged security-related issues using ML models

Analyzing developer responses to user-reported security issues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzed 10.9M GitHub issues in npm packages

Developed ML models to identify security issues

Highlighted gaps in bot detection and developer response

🔎 Similar Papers

No similar papers found.

Authors to Follow