YouthSafe: A Youth-Centric Safety Benchmark and Safeguard Model for Large Language Models

📅 2025-09-10
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Current safety benchmarks and moderation systems lack targeted evaluation of adolescent- and young-adult–specific vulnerabilities—such as emotional dependency, identity exploration, and boundary ambiguity—when interacting with large language models (LLMs), thereby exposing users to latent risks in emotional support, creative expression, and educational applications. To address this gap, we introduce YAIR, the first fine-grained safety benchmark specifically designed for youth, covering 78 distinct risk categories. We further propose YouthSafe, a dedicated real-time risk detection model that integrates deep semantic understanding with contextual modeling to significantly improve identification of nuanced threats—including grooming, identity confusion, and microaggressions. Empirical evaluation demonstrates that mainstream moderation systems perform poorly on YAIR, whereas YouthSafe achieves substantial gains in accuracy, recall, and F1-score over baseline methods. This work establishes a new paradigm for age-appropriate safety governance in generative AI.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly used by teenagers and young adults in everyday life, ranging from emotional support and creative expression to educational assistance. However, their unique vulnerabilities and risk profiles remain under-examined in current safety benchmarks and moderation systems, leaving this population disproportionately exposed to harm. In this work, we present Youth AI Risk (YAIR), the first benchmark dataset designed to evaluate and improve the safety of youth LLM interactions. YAIR consists of 12,449 annotated conversation snippets spanning 78 fine grained risk types, grounded in a taxonomy of youth specific harms such as grooming, boundary violation, identity confusion, and emotional overreliance. We systematically evaluate widely adopted moderation models on YAIR and find that existing approaches substantially underperform in detecting youth centered risks, often missing contextually subtle yet developmentally harmful interactions. To address these gaps, we introduce YouthSafe, a real-time risk detection model optimized for youth GenAI contexts. YouthSafe significantly outperforms prior systems across multiple metrics on risk detection and classification, offering a concrete step toward safer and more developmentally appropriate AI interactions for young users.
Problem

Research questions and friction points this paper is trying to address.

Addressing youth-specific vulnerabilities in LLM safety benchmarks
Detecting contextually subtle yet harmful AI interactions for youth
Improving real-time risk detection in youth-centric GenAI contexts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Youth-specific benchmark dataset YAIR
Real-time risk detection model YouthSafe
Outperforms existing moderation systems significantly
🔎 Similar Papers
No similar papers found.
Yaman Yu
Yaman Yu
University of Illinois Urbana-Champaign
usable privacy and securityhuman-computer interactionAccessibilityWeb3
Yiren Liu
Yiren Liu
University of Illinois at Urbana-Champaign
Human Computer Interaction
J
Jacky Zhang
University of Illinois Urbana–Champaign
Y
Yun Huang
University of Illinois Urbana–Champaign
Y
Yang Wang
University of Illinois Urbana–Champaign