ChildGuard: A Specialized Dataset for Combatting Child-Targeted Hate Speech

📅 2025-06-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing hate speech datasets lack child-specific annotations, failing to capture contextual nuances and the distinct psychological impact of hate speech on children. Method: We introduce ChildGuard—the first dedicated hate speech dataset for children—featuring three-tier annotations: age grouping, provenance-aware contextual grounding, and child-specific emotional impact intensity, with sensitivity to developmental vulnerability as a novel dimension. Built via multi-source data curation and human-in-the-loop annotation, it comprises 12K high-quality samples. We conduct benchmark evaluations across SOTA models (BERT, RoBERTa, Llama-3). Results: Empirical analysis reveals an average false-negative rate of 43.7% for child-directed hate speech across mainstream models. This work bridges critical gaps in age-specificity and psychological sensitivity, advancing data paradigms for child digital safety, and provides a reproducible benchmark alongside open-source resources.

Technology Category

Application Category

📝 Abstract

The increasing prevalence of child-targeted hate speech online underscores the urgent need for specialized datasets to address this critical issue. Existing hate speech datasets lack agespecific annotations, fail to capture nuanced contexts, and overlook the unique emotional impact on children. To bridge this gap, we introduce ChildGuard1, a curated dataset derived from existing corpora and enriched with child-specific annotations. ChildGuard captures diverse contexts of child-targeted hate speech, spanning age groups. We benchmark existing state-of-the-art hate speech detection methods, including Large Language Models (LLMs), and assess their effectiveness in detecting and contextualizing child-targeted hate speech. To foster further research in this area, we publicly release ChildGuard, providing a robust foundation for developing improved methods to detect and mitigate such harm.

Problem

Research questions and friction points this paper is trying to address.

Lack of age-specific hate speech datasets for children

Insufficient context and emotional impact analysis in existing datasets

Need for improved detection methods for child-targeted hate speech

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated dataset with child-specific annotations

Benchmarked hate speech detection methods including LLMs

Publicly released dataset for further research

🔎 Similar Papers

No similar papers found.

Authors to Follow