Systematic Hazard Analysis for Frontier AI using STPA

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Leading AI organizations lack systematic methodologies for identifying safety risks, particularly those arising from feedback-driven and interactive failure modes; manual assessment further suffers from limited causal traceability and incompleteness. Method: This work pioneers the adaptation of System Safety Engineering’s STPA (System-Theoretic Process Analysis)—a rigorous hazard analysis framework from aviation and industrial safety—to AI systems. We construct control structure models, integrate loss scenario simulation with an AI-specific safety case framework (Korbak et al., 2025), and automatically identify Unsafe Control Actions and associated Loss Scenarios overlooked by existing threat models. Contribution/Results: The approach enables LLM-augmented, scalable analysis, significantly improving coverage, causal traceability, and robustness in hazard identification. Experimental validation demonstrates that STPA provides a verifiable, extensible, and complementary assurance mechanism for AI safety governance.

Technology Category

Application Category

📝 Abstract

All of the frontier AI companies have published safety frameworks where they define capability thresholds and risk mitigations that determine how they will safely develop and deploy their models. Adoption of systematic approaches to risk modelling, based on established practices used in safety-critical industries, has been recommended, however frontier AI companies currently do not describe in detail any structured approach to identifying and analysing hazards. STPA (Systems-Theoretic Process Analysis) is a systematic methodology for identifying how complex systems can become unsafe, leading to hazards. It achieves this by mapping out controllers and controlled processes then analysing their interactions and feedback loops to understand how harmful outcomes could occur (Leveson&Thomas, 2018). We evaluate STPA's ability to broaden the scope, improve traceability and strengthen the robustness of safety assurance for frontier AI systems. Applying STPA to the threat model and scenario described in 'A Sketch of an AI Control Safety Case' (Korbak et al., 2025), we derive a list of Unsafe Control Actions. From these we select a subset and explore the Loss Scenarios that lead to them if left unmitigated. We find that STPA is able to identify causal factors that may be missed by unstructured hazard analysis methodologies thereby improving robustness. We suggest STPA could increase the safety assurance of frontier AI when used to complement or check coverage of existing AI governance techniques including capability thresholds, model evaluations and emergency procedures. The application of a systematic methodology supports scalability by increasing the proportion of the analysis that could be conducted by LLMs, reducing the burden on human domain experts.

Problem

Research questions and friction points this paper is trying to address.

Identifying hazards in frontier AI using STPA methodology

Improving safety assurance robustness for AI systems

Enhancing traceability and scope of AI risk analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses STPA for systematic hazard analysis

Identifies unsafe control actions effectively

Enhances AI safety assurance robustness

🔎 Similar Papers

No similar papers found.

Authors to Follow