đ¤ AI Summary
Current safety benchmarks and moderation systems lack targeted evaluation of adolescent- and young-adultâspecific vulnerabilitiesâsuch as emotional dependency, identity exploration, and boundary ambiguityâwhen interacting with large language models (LLMs), thereby exposing users to latent risks in emotional support, creative expression, and educational applications. To address this gap, we introduce YAIR, the first fine-grained safety benchmark specifically designed for youth, covering 78 distinct risk categories. We further propose YouthSafe, a dedicated real-time risk detection model that integrates deep semantic understanding with contextual modeling to significantly improve identification of nuanced threatsâincluding grooming, identity confusion, and microaggressions. Empirical evaluation demonstrates that mainstream moderation systems perform poorly on YAIR, whereas YouthSafe achieves substantial gains in accuracy, recall, and F1-score over baseline methods. This work establishes a new paradigm for age-appropriate safety governance in generative AI.
đ Abstract
Large Language Models (LLMs) are increasingly used by teenagers and young adults in everyday life, ranging from emotional support and creative expression to educational assistance. However, their unique vulnerabilities and risk profiles remain under-examined in current safety benchmarks and moderation systems, leaving this population disproportionately exposed to harm. In this work, we present Youth AI Risk (YAIR), the first benchmark dataset designed to evaluate and improve the safety of youth LLM interactions. YAIR consists of 12,449 annotated conversation snippets spanning 78 fine grained risk types, grounded in a taxonomy of youth specific harms such as grooming, boundary violation, identity confusion, and emotional overreliance. We systematically evaluate widely adopted moderation models on YAIR and find that existing approaches substantially underperform in detecting youth centered risks, often missing contextually subtle yet developmentally harmful interactions. To address these gaps, we introduce YouthSafe, a real-time risk detection model optimized for youth GenAI contexts. YouthSafe significantly outperforms prior systems across multiple metrics on risk detection and classification, offering a concrete step toward safer and more developmentally appropriate AI interactions for young users.