Improving Aviation Safety Analysis: Automated HFACS Classification Using Reinforcement Learning with Group Relative Policy Optimization

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Traditional HFACS methods suffer from poor scalability and low consistency in aviation safety analysis. Method: This paper proposes a reinforcement learning–based automated human factors classification framework, built upon the Llama-3.1-8B foundation model and integrating Group Relative Policy Optimization (GRPO), multi-component reward modeling, and domain-specific synthetic data generation to yield a lightweight, domain-adapted model. Contribution/Results: We introduce a novel multi-label evaluation benchmark based on exact-match accuracy, overcoming performance bottlenecks of general-purpose LLMs in fine-grained attribution tasks. Experiments show the model achieves an exact-match accuracy of 0.1800 (a 350% improvement) and a partial-match accuracy of 0.8800—substantially outperforming GPT-5-mini and Gemini-2.5-flash. With compact model size and low inference latency, it enables edge deployment, providing a practical, real-time solution for aviation safety analysis.

Technology Category

Application Category

📝 Abstract

Analyzing the human factors behind aviation accidents is crucial for preventing future incidents, yet traditional methods using the Human Factors Analysis and Classification System (HFACS) are limited by scalability and consistency. To address this, we introduce an automated HFACS classification framework for aviation safety analysis that utilizes Reinforcement Learning with Group Relative Policy Optimization (GRPO) to fine-tune a Llama-3.1 8B language model. Our approach incorporates a multi-component reward system tailored for aviation safety analysis and integrates synthetic data generation to overcome class imbalance in accident datasets. The resulting GRPO-optimized model achieved noticeable performance gains, including a 350% increase in exact match accuracy (from 0.0400 to 0.1800) and an improved partial match accuracy of 0.8800. Significantly, our specialized model outperforms state-of-the-art LLMs (Large Language Models), including GPT-5-mini and Gemini-2.5-fiash, on key metrics. This research also proposes exact match accuracy in multi-label HFACS classification problem as a new benchmarking methodology to evaluate the advanced reasoning capabilities of language models. Ultimately, our work validates that smaller, domain-optimized models can provide a computationally efficient and better solution for critical safety analysis. This approach makes powerful, low-latency deployment on resource-constrained edge devices feasible.

Problem

Research questions and friction points this paper is trying to address.

Automating HFACS classification for aviation safety analysis

Overcoming scalability and consistency in traditional HFACS methods

Addressing class imbalance in aviation accident datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning with GRPO optimizes Llama-3.1 8B

Multi-component reward system tailored for aviation safety

Synthetic data generation addresses class imbalance issues

🔎 Similar Papers

Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding