🤖 AI Summary
To address the challenges of short length, high domain specificity, and deeply embedded themes in aviation safety incident reports, this study systematically evaluates four topic modeling techniques—pLSA, LSA, LDA, and NMF—on Australian Transport Safety Bureau (ATSB) incident narratives, constituting the first such assessment in aviation safety analysis. Employing standardized text preprocessing and a multi-dimensional evaluation framework—including topic coherence, operational interpretability, and keyword focus—we find that LDA achieves the best overall performance in thematic coherence and safety-practice interpretability, whereas NMF excels significantly in precise keyword extraction. The study delineates the contextual applicability boundaries and limitations of each model within aviation safety, providing empirically grounded guidance for regulatory agencies selecting appropriate topic modeling approaches. By bridging a critical methodological gap, it advances the application of NLP in high-stakes, domain-intensive short-text safety analytics.
📝 Abstract
Improvements in aviation safety analysis call for innovative techniques to extract valuable insights from the abundance of textual data available in accident reports. This paper explores the application of four prominent topic modelling techniques, namely Probabilistic Latent Semantic Analysis (pLSA), Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Non-negative Matrix Factorization (NMF), to dissect aviation incident narratives using the Australian Transport Safety Bureau (ATSB) dataset. The study examines each technique's ability to unveil latent thematic structures within the data, providing safety professionals with a systematic approach to gain actionable insights. Through a comparative analysis, this research not only showcases the potential of these methods in aviation safety but also elucidates their distinct advantages and limitations.