π€ AI Summary
Pythonβs software supply chain heavily relies on the PyPI ecosystem, yet its complex, transitive dependency structure facilitates widespread vulnerability propagation; existing studies lack systematic, quantitative analysis of the prevalence of vulnerable dependencies. Method: We propose PyPitfall, the first framework to construct a complete dependency graph for all 378,573 PyPI packages, perform semantic version-range resolution, and precisely match dependencies against the CVE database to automate vulnerability propagation path analysis. Contribution/Results: Our analysis identifies 4,655 packages with explicit dependencies on known vulnerable versions and 141,044 packages that indirectly include such versions due to permissive version constraints. These findings quantify the scale of systemic security risks arising from dependency mismanagement in the Python ecosystem, providing the first large-scale empirical evidence and methodological foundation for software supply chain security governance.
π Abstract
Python software development heavily relies on third-party packages. Direct and transitive dependencies create a labyrinth of software supply chains. While it is convenient to reuse code, vulnerabilities within these dependency chains can propagate through dependencies, potentially affecting down-stream packages and applications. PyPI, the official Python package repository, hosts many packages and lacks a comprehensive analysis of the prevalence of vulnerable dependencies. This paper introduces PyPitfall, a quantitative analysis of vulnerable dependencies across the PyPI ecosystem. We analyzed the dependency structures of 378,573 PyPI packages and identified 4,655 packages that explicitly require at least one known-vulnerable version and 141,044 packages that permit vulnerable versions within specified ranges. By characterizing the ecosystem-wide dependency landscape and the security impact of transitive dependencies, we aim to raise awareness of Python software supply chain security.