🤖 AI Summary
This study investigates whether a statistically significant association exists between popularity and security in PHP open-source packages. Method: Leveraging a large-scale empirical analysis of nearly 400,000 PHP packages, the work integrates version history parsing, CVE vulnerability mapping, and popularity metrics (e.g., download counts, GitHub stars), employing non-parametric statistical tests—including the Mann–Whitney U test—to assess significance. Contribution/Results: It is the first study to systematically replicate and validate the “popularity–vulnerability” hypothesis within the PHP ecosystem: packages with known CVEs exhibit significantly higher average popularity than those without (p < 0.001). Beyond providing robust empirical support for a long-standing conjecture, this work strengthens the empirical foundation of software security knowledge and represents the first large-scale, language-specific validation of the vulnerability–popularity correlation in PHP.
📝 Abstract
There has been a long-standing hypothesis that a software's popularity is related to its security or insecurity in both research and popular discourse. There are also a few empirical studies that have examined the hypothesis, either explicitly or implicitly. The present work continues with and contributes to this research with a replication-motivated large-scale analysis of software written in the PHP programming language. The dataset examined contains nearly four hundred thousand open source software packages written in PHP. According to the results based on reported security vulnerabilities, the hypothesis does holds; packages having been affected by vulnerabilities over their release histories are generally more popular than packages without having been affected by a single vulnerability. With this replication results, the paper contributes to the efforts to strengthen the empirical knowledge base in cyber and software security.