An Empirical Study of Policy-as-Code Adoption in Open-Source Software Projects

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of empirical understanding regarding the adoption of Policy-as-Code (PaC) tools in real-world open-source software development. Employing a mixed-methods approach, the authors conduct a large-scale analysis of 399 GitHub repositories utilizing nine prominent PaC tools and introduce, for the first time, a comprehensive taxonomy of PaC usage comprising five major categories and fifteen subcategories. The findings reveal that PaC is extensively employed in early-stage projects for governance, configuration control, and documentation. The research also uncovers strong co-usage patterns among tools such as OPA and Gatekeeper and identifies emerging applications in domains like MLOps. These insights provide empirical grounding for improving tool interoperability and informing best practices in PaC implementation.

Technology Category

Application Category

📝 Abstract
\textbf{Context:} Policy-as-Code (PaC) has become a foundational approach for embedding governance, compliance, and security requirements directly into software systems. While organizations increasingly adopt PaC tools, the software engineering community lacks an empirical understanding of how these tools are used in real-world development practices. \textbf{Objective:} This paper aims to bridge this gap by conducting the first large-scale study of PaC usage in open-source software. Our goal is to characterize how PaC tools are adopted, what purposes they serve, and what governance activities they support across diverse software ecosystems. \textbf{Method:} We analyzed 399 GitHub repositories using nine widely adopted PaC tools. Our mixed-methods approach combines quantitative analysis of tool usage and project characteristics with a qualitative investigation of policy files. We further employ a Large Language Model (LLM)--assisted classification pipeline, refined through expert validation, to derive a taxonomy of PaC usage consisting of 5 categories and 15 sub-categories. \textbf{Results:} Our study reveals substantial diversity in PaC adoption. PaC tools are frequently used in early-stage projects and are heavily oriented toward governance, configuration control, and documentation. We also observe emerging PaC usage in MLOps pipelines and strong co-usage patterns, such as between OPA and Gatekeeper. Our taxonomy highlights recurring governance intents. \textbf{Conclusion:} Our findings offer actionable insights for practitioners and tool developers. They highlight concrete usage patterns, emphasize actual PaC usage, and motivate opportunities for improving tool interoperability. This study lays the empirical foundation for future research on PaC practices and their role in ensuring trustworthy, compliant software systems.
Problem

Research questions and friction points this paper is trying to address.

Policy-as-Code
open-source software
governance
empirical study
software compliance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Policy-as-Code
empirical study
LLM-assisted classification
governance taxonomy
open-source software
🔎 Similar Papers
No similar papers found.
P
Patrick Loic Foalem
Department of Computer and Software Engineering, Polytechnique Montreal, Montreal, H3T 1J4, Quebec, Canada
Foutse Khomh
Foutse Khomh
NSERC Arthur B. McDonald Fellow, CRC Tier 1, Canada CIFAR AI Chair, FRQ-IVADO Chair, Full Professor
Software engineeringMachine learning systems engineeringMining software repositoriesReverse
Leuson Da Silva
Leuson Da Silva
Postdoctoral Fellow - Polytechnique Montreal
Software EngineeringGenerative AIEmpirical StudiesCode Integration
E
Ettore Merlo
Department of Computer and Software Engineering, Polytechnique Montreal, Montreal, H3T 1J4, Quebec, Canada