🤖 AI Summary
This study addresses the lack of empirical understanding regarding the adoption of Policy-as-Code (PaC) tools in real-world open-source software development. Employing a mixed-methods approach, the authors conduct a large-scale analysis of 399 GitHub repositories utilizing nine prominent PaC tools and introduce, for the first time, a comprehensive taxonomy of PaC usage comprising five major categories and fifteen subcategories. The findings reveal that PaC is extensively employed in early-stage projects for governance, configuration control, and documentation. The research also uncovers strong co-usage patterns among tools such as OPA and Gatekeeper and identifies emerging applications in domains like MLOps. These insights provide empirical grounding for improving tool interoperability and informing best practices in PaC implementation.
📝 Abstract
\textbf{Context:} Policy-as-Code (PaC) has become a foundational approach for embedding governance, compliance, and security requirements directly into software systems. While organizations increasingly adopt PaC tools, the software engineering community lacks an empirical understanding of how these tools are used in real-world development practices. \textbf{Objective:} This paper aims to bridge this gap by conducting the first large-scale study of PaC usage in open-source software. Our goal is to characterize how PaC tools are adopted, what purposes they serve, and what governance activities they support across diverse software ecosystems. \textbf{Method:} We analyzed 399 GitHub repositories using nine widely adopted PaC tools. Our mixed-methods approach combines quantitative analysis of tool usage and project characteristics with a qualitative investigation of policy files. We further employ a Large Language Model (LLM)--assisted classification pipeline, refined through expert validation, to derive a taxonomy of PaC usage consisting of 5 categories and 15 sub-categories. \textbf{Results:} Our study reveals substantial diversity in PaC adoption. PaC tools are frequently used in early-stage projects and are heavily oriented toward governance, configuration control, and documentation. We also observe emerging PaC usage in MLOps pipelines and strong co-usage patterns, such as between OPA and Gatekeeper. Our taxonomy highlights recurring governance intents. \textbf{Conclusion:} Our findings offer actionable insights for practitioners and tool developers. They highlight concrete usage patterns, emphasize actual PaC usage, and motivate opportunities for improving tool interoperability. This study lays the empirical foundation for future research on PaC practices and their role in ensuring trustworthy, compliant software systems.