Knowledge Acquisition on Mass-shooting Events via LLMs for AI-Driven Justice

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Unstructured text from mass shooting incidents impedes judicial investigation and evidence-informed policymaking. Method: We introduce the first judicially oriented named entity recognition (NER) dataset specifically designed for mass shootings, annotating critical entities—including perpetrators, victims, locations, and weapons—and propose a few-shot prompting framework to systematically evaluate GPT-4o and o1-mini across heterogeneous text sources (news articles, police reports, social media). Contribution/Results: Experiments show GPT-4o achieves 89.7% Micro-F1 on real-world data; o1-mini attains comparable performance with higher inference efficiency. Performance improves significantly with increased shot count, revealing a synergistic optimization between LLM scale and judicial NER task complexity. This work formally defines the judicial NER task for mass shootings and establishes both a novel paradigm and a foundational benchmark resource for knowledge extraction in public safety research.

Technology Category

Application Category

📝 Abstract

Mass-shooting events pose a significant challenge to public safety, generating large volumes of unstructured textual data that hinder effective investigations and the formulation of public policy. Despite the urgency, few prior studies have effectively automated the extraction of key information from these events to support legal and investigative efforts. This paper presented the first dataset designed for knowledge acquisition on mass-shooting events through the application of named entity recognition (NER) techniques. It focuses on identifying key entities such as offenders, victims, locations, and criminal instruments, that are vital for legal and investigative purposes. The NER process is powered by Large Language Models (LLMs) using few-shot prompting, facilitating the efficient extraction and organization of critical information from diverse sources, including news articles, police reports, and social media. Experimental results on real-world mass-shooting corpora demonstrate that GPT-4o is the most effective model for mass-shooting NER, achieving the highest Micro Precision, Micro Recall, and Micro F1-scores. Meanwhile, o1-mini delivers competitive performance, making it a resource-efficient alternative for less complex NER tasks. It is also observed that increasing the shot count enhances the performance of all models, but the gains are more substantial for GPT-4o and o1-mini, highlighting their superior adaptability to few-shot learning scenarios.

Problem

Research questions and friction points this paper is trying to address.

Automate extraction of mass-shooting event data for legal investigations

Identify key entities like offenders, victims, and locations via NER

Evaluate LLMs for efficient information extraction from diverse sources

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs enable few-shot NER for mass-shooting data

GPT-4o achieves highest accuracy in entity recognition

o1-mini offers efficient alternative for simpler tasks

🔎 Similar Papers

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval