Forecasting the Maintained Score from the OpenSSF Scorecard for GitHub Repositories linked to PyPI libraries

📅 2026-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitation of the OpenSSF Scorecard’s Maintained metric, which reflects only recent 90-day activity and thus lacks predictive capability for deprecation risk in open-source dependencies. To enable proactive risk assessment, this work formulates Maintained score prediction as a multivariate time series task, leveraging three years of historical data from 3,220 PyPI core libraries and their associated GitHub repositories. The authors propose four target representations and systematically evaluate VARMA, Random Forest, and LSTM models across varying training windows (3–12 months) and forecast horizons (1–6 months). Experimental results demonstrate that simple models can match or exceed the performance of deep learning approaches, achieving classification accuracies above 0.95 for maintenance status levels and over 0.80 for trend types, thereby offering a practical foundation for anticipating open-source project maintenance risks.

Technology Category

Application Category

📝 Abstract
The OpenSSF Scorecard is widely used to assess the security posture of open-source software repositories, with the Maintained metric indicating recent development activity and helping identify potentially abandoned dependencies. However, this metric is inherently retrospective, reflecting only the past 90 days of activity and providing no insight into future maintenance, which limits its usefulness for proactive risk assessment. In this paper, we study to what extent future maintenance activity, as captured by the OpenSSF Maintained score, can be forecasted. We analyze 3,220 GitHub repositories associated with the top 1% most central PyPI libraries by PageRank and reconstruct historical Maintained scores over a three-year period. We formulate the task as multivariate time series forecasting and consider four target representations: raw scores, bucketed maintenance levels, numerical trend slopes, and categorical trend types. We compare a statistical model (VARMA), a machine learning model (Random Forest), and a deep learning model (LSTM) across training windows of 3-12 months and forecasting horizons of 1-6 months. Our results show that future maintenance activity can be predicted with meaningful accuracy, particularly for aggregated representations such as bucketed scores and trend types, achieving accuracies above 0.95 and 0.80, respectively. Simpler statistical and machine learning models perform on par with deep learning approaches, indicating that complex architectures are not required. These findings suggest that predictive modeling can effectively complement existing Scorecard metrics, enabling more proactive assessment of open-source maintenance risks.
Problem

Research questions and friction points this paper is trying to address.

OpenSSF Scorecard
Maintained score
GitHub repositories
PyPI libraries
maintenance forecasting
Innovation

Methods, ideas, or system contributions that make the work stand out.

time series forecasting
open-source software security
maintained score prediction
multivariate forecasting
proactive risk assessment
🔎 Similar Papers
No similar papers found.
A
Alexandros Tsakpinis
fortiss GmbH, Munich, Germany
E
Efe Berk Ergülec
Technical University of Munich, Munich, Germany
E
Emil Schwenger
Technical University of Munich, Munich, Germany
Alexander Pretschner
Alexander Pretschner
Professor of Computer Science, Technische Universität München
Software EngineeringSecurityModel-Based Testing