Probabilistic Record Linkage of Two Gun Violence Data Sets

๐Ÿ“… 2025-03-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the challenge of data silos in gun violence research by introducing the first cross-source probabilistic linkage framework to integrate geographically granular Gun Violence Archive (GVA) records with sociodemographic variables from the National Violent Death Reporting System (NVDRS). Under strict privacy constraints and limited field availability, the method extends the Fellegi-Sunter model through tailored feature engineering, fuzzy matching, and iterative expert validation. It achieves high-precision record linkage across heterogeneous schemas and reporting standards. The framework successfully matched 27,420 gun violence incidents; manual verification on a stratified sample of 942 cases yielded an accuracy of 90.12%. This approach overcomes longstanding barriers to integrating multi-source, heterogeneous violent injury data, establishing a reproducible and scalable data fusion paradigm that supports public healthโ€“driven, precision intervention strategies.

Technology Category

Application Category

๐Ÿ“ Abstract
Objective: Gun violence is a serious public health problem in the United States. The Gun Violence Archive (GVA) provides detailed geographic information, while the National Violent Death Reporting System (NVDRS) offers demographic, socioeconomic, and narrative data on gun homicides. We developed and tested a method for merging datasets to inform analysis and strategies to reduce gun violence rates in the United States. Methods: After preprocessing the data, we used a probabilistic record linkage program to link records from the GVA (n = 36,245) with records from the NVDRS (n = 30,592). We evaluated sensitivity (the false match rate) by using a manual approach. Results: The linkage returned 27,420 matches of gun violence incidents from the GVA and NVDRS datasets. Because of restricted details accessible from GVA online records, only 942 of these matched records could be manually evaluated. Our framework achieved a 90.12% (849 of 942 accuracy rate in linking GVA incidents with corresponding NVDRS records. Practice Implications: Electronic linkage of gun violence data from 2 sources is feasible and can be used to increase the utility of the datasets.
Problem

Research questions and friction points this paper is trying to address.

Merge GVA and NVDRS datasets to enhance gun violence analysis
Develop probabilistic linkage method for accurate data matching
Improve dataset utility by combining geographic and demographic data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic record linkage for merging datasets
Manual evaluation to assess linkage accuracy
Combining GVA and NVDRS data for analysis
๐Ÿ”Ž Similar Papers
No similar papers found.
I
Iris Horng
Department of Statistics and Data Science, University of Pennsylvania, Philadelphia
Qishuo Yin
Qishuo Yin
Princeton University
statisticscausal inference
William Chan
William Chan
Department of Economics, University of Pennsylvania, Philadelphia
Jared Murray
Jared Murray
Associate Professor of Statistics and Machine Learning, University of Texas at Austin
D
Dylan S. Small
Department of Statistics and Data Science, University of Pennsylvania, Philadelphia