๐ค AI Summary
This study addresses the challenge of data silos in gun violence research by introducing the first cross-source probabilistic linkage framework to integrate geographically granular Gun Violence Archive (GVA) records with sociodemographic variables from the National Violent Death Reporting System (NVDRS). Under strict privacy constraints and limited field availability, the method extends the Fellegi-Sunter model through tailored feature engineering, fuzzy matching, and iterative expert validation. It achieves high-precision record linkage across heterogeneous schemas and reporting standards. The framework successfully matched 27,420 gun violence incidents; manual verification on a stratified sample of 942 cases yielded an accuracy of 90.12%. This approach overcomes longstanding barriers to integrating multi-source, heterogeneous violent injury data, establishing a reproducible and scalable data fusion paradigm that supports public healthโdriven, precision intervention strategies.
๐ Abstract
Objective: Gun violence is a serious public health problem in the United States. The Gun Violence Archive (GVA) provides detailed geographic information, while the National Violent Death Reporting System (NVDRS) offers demographic, socioeconomic, and narrative data on gun homicides. We developed and tested a method for merging datasets to inform analysis and strategies to reduce gun violence rates in the United States. Methods: After preprocessing the data, we used a probabilistic record linkage program to link records from the GVA (n = 36,245) with records from the NVDRS (n = 30,592). We evaluated sensitivity (the false match rate) by using a manual approach. Results: The linkage returned 27,420 matches of gun violence incidents from the GVA and NVDRS datasets. Because of restricted details accessible from GVA online records, only 942 of these matched records could be manually evaluated. Our framework achieved a 90.12% (849 of 942 accuracy rate in linking GVA incidents with corresponding NVDRS records. Practice Implications: Electronic linkage of gun violence data from 2 sources is feasible and can be used to increase the utility of the datasets.