Probabilistic Record Linkage of Two Gun Violence Data Sets

๐Ÿ“… 2025-03-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

247K/year
๐Ÿค– AI Summary
This study addresses the challenge of data silos in gun violence research by introducing the first cross-source probabilistic linkage framework to integrate geographically granular Gun Violence Archive (GVA) records with sociodemographic variables from the National Violent Death Reporting System (NVDRS). Under strict privacy constraints and limited field availability, the method extends the Fellegi-Sunter model through tailored feature engineering, fuzzy matching, and iterative expert validation. It achieves high-precision record linkage across heterogeneous schemas and reporting standards. The framework successfully matched 27,420 gun violence incidents; manual verification on a stratified sample of 942 cases yielded an accuracy of 90.12%. This approach overcomes longstanding barriers to integrating multi-source, heterogeneous violent injury data, establishing a reproducible and scalable data fusion paradigm that supports public healthโ€“driven, precision intervention strategies.

Technology Category

Application Category

๐Ÿ“ Abstract
Objective: Gun violence is a serious public health problem in the United States. The Gun Violence Archive (GVA) provides detailed geographic information, while the National Violent Death Reporting System (NVDRS) offers demographic, socioeconomic, and narrative data on gun homicides. We developed and tested a method for merging datasets to inform analysis and strategies to reduce gun violence rates in the United States. Methods: After preprocessing the data, we used a probabilistic record linkage program to link records from the GVA (n = 36,245) with records from the NVDRS (n = 30,592). We evaluated sensitivity (the false match rate) by using a manual approach. Results: The linkage returned 27,420 matches of gun violence incidents from the GVA and NVDRS datasets. Because of restricted details accessible from GVA online records, only 942 of these matched records could be manually evaluated. Our framework achieved a 90.12% (849 of 942 accuracy rate in linking GVA incidents with corresponding NVDRS records. Practice Implications: Electronic linkage of gun violence data from 2 sources is feasible and can be used to increase the utility of the datasets.
Problem

Research questions and friction points this paper is trying to address.

Merge GVA and NVDRS datasets to enhance gun violence analysis
Develop probabilistic linkage method for accurate data matching
Improve dataset utility by combining geographic and demographic data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic record linkage for merging datasets
Manual evaluation to assess linkage accuracy
Combining GVA and NVDRS data for analysis