YouthSafe: A Youth-Centric Safety Benchmark and Safeguard Model for Large Language Models

📅 2025-09-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current safety benchmarks and moderation systems lack targeted evaluation of adolescent- and young-adult–specific vulnerabilities—such as emotional dependency, identity exploration, and boundary ambiguity—when interacting with large language models (LLMs), thereby exposing users to latent risks in emotional support, creative expression, and educational applications. To address this gap, we introduce YAIR, the first fine-grained safety benchmark specifically designed for youth, covering 78 distinct risk categories. We further propose YouthSafe, a dedicated real-time risk detection model that integrates deep semantic understanding with contextual modeling to significantly improve identification of nuanced threats—including grooming, identity confusion, and microaggressions. Empirical evaluation demonstrates that mainstream moderation systems perform poorly on YAIR, whereas YouthSafe achieves substantial gains in accuracy, recall, and F1-score over baseline methods. This work establishes a new paradigm for age-appropriate safety governance in generative AI.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used by teenagers and young adults in everyday life, ranging from emotional support and creative expression to educational assistance. However, their unique vulnerabilities and risk profiles remain under-examined in current safety benchmarks and moderation systems, leaving this population disproportionately exposed to harm. In this work, we present Youth AI Risk (YAIR), the first benchmark dataset designed to evaluate and improve the safety of youth LLM interactions. YAIR consists of 12,449 annotated conversation snippets spanning 78 fine grained risk types, grounded in a taxonomy of youth specific harms such as grooming, boundary violation, identity confusion, and emotional overreliance. We systematically evaluate widely adopted moderation models on YAIR and find that existing approaches substantially underperform in detecting youth centered risks, often missing contextually subtle yet developmentally harmful interactions. To address these gaps, we introduce YouthSafe, a real-time risk detection model optimized for youth GenAI contexts. YouthSafe significantly outperforms prior systems across multiple metrics on risk detection and classification, offering a concrete step toward safer and more developmentally appropriate AI interactions for young users.

Problem

Research questions and friction points this paper is trying to address.

Addressing youth-specific vulnerabilities in LLM safety benchmarks

Detecting contextually subtle yet harmful AI interactions for youth

Improving real-time risk detection in youth-centric GenAI contexts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Youth-specific benchmark dataset YAIR

Real-time risk detection model YouthSafe

Outperforms existing moderation systems significantly

🔎 Similar Papers

No similar papers found.

Authors to Follow