🤖 AI Summary
High-quality data are critical for effective road traffic crash prevention, yet Alcohol Inference Mismatch (AIM)—inconsistencies between alcohol-related conclusions drawn from crash narratives and structured database fields—severely undermines data reliability. This paper introduces the first database–narrative semantic alignment framework, integrating BERT-based natural language representations with Probit/Logit statistical modeling to systematically detect and quantify AIM. Applied to 372,000 crash records from Iowa, the method identifies an overall AIM prevalence of 24.03%, pinpoints 2,767 unambiguous mismatch cases, and reveals high-incidence geographic clusters. Unlike conventional approaches relying solely on rigid field-level comparisons, our framework enables fine-grained, interpretable AIM detection by aligning semantic content across unstructured narratives and structured fields. The results provide a novel paradigm for crash data quality assessment, evidence-based law enforcement training, and targeted transportation policy formulation.
📝 Abstract
Road traffic crashes are a significant global cause of fatalities, emphasizing the urgent need for accurate crash data to enhance prevention strategies and inform policy development. This study addresses the challenge of alcohol inference mismatch (AIM) by employing database narrative alignment to identify AIM in crash data. A framework was developed to improve data quality in crash management systems and reduce the percentage of AIM crashes. Utilizing the BERT model, the analysis of 371,062 crash records from Iowa (2016-2022) revealed 2,767 AIM incidents, resulting in an overall AIM percentage of 24.03%. Statistical tools, including the Probit Logit model, were used to explore the crash characteristics affecting AIM patterns. The findings indicate that alcohol-related fatal crashes and nighttime incidents have a lower percentage of the mismatch, while crashes involving unknown vehicle types and older drivers are more susceptible to mismatch. The geospatial cluster as part of this study can identify the regions which have an increased need for education and training. These insights highlight the necessity for targeted training programs and data management teams to improve the accuracy of crash reporting and support evidence-based policymaking.