CRADLE Bench: A Clinician-Annotated Benchmark for Multi-Faceted Mental Health Crisis and Safety Risk Detection

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limited reliability of language models in detecting mental health crises—specifically seven high-risk scenarios including suicidal ideation, sexual assault, and domestic violence. To this end, we introduce the first multi-crisis detection benchmark dataset integrating clinical diagnostic criteria with temporal dimensionality. Our method innovatively proposes a time-series annotation paradigm and a unified modeling framework for diverse crisis types, combining expert clinician annotations with automated labeling via ensemble voting across multilingual foundation models—thereby substantially improving annotation quality and generalizability. The dataset comprises 600 test and 420 development samples, with a 4K-sample training corpus. On six crisis detection tasks, our ensemble-consensus fine-tuned model consistently outperforms single-annotator baselines. This work establishes a reproducible, scalable evaluation and modeling paradigm for high-risk content identification in mental health applications.

Technology Category

Application Category

📝 Abstract

Detecting mental health crisis situations such as suicide ideation, rape, domestic violence, child abuse, and sexual harassment is a critical yet underexplored challenge for language models. When such situations arise during user--model interactions, models must reliably flag them, as failure to do so can have serious consequences. In this work, we introduce CRADLE BENCH, a benchmark for multi-faceted crisis detection. Unlike previous efforts that focus on a limited set of crisis types, our benchmark covers seven types defined in line with clinical standards and is the first to incorporate temporal labels. Our benchmark provides 600 clinician-annotated evaluation examples and 420 development examples, together with a training corpus of around 4K examples automatically labeled using a majority-vote ensemble of multiple language models, which significantly outperforms single-model annotation. We further fine-tune six crisis detection models on subsets defined by consensus and unanimous ensemble agreement, providing complementary models trained under different agreement criteria.

Problem

Research questions and friction points this paper is trying to address.

Detecting multiple mental health crisis types using language models

Creating clinician-annotated benchmark with temporal crisis labels

Developing reliable crisis detection models with ensemble methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clinician-annotated benchmark with temporal labels

Automated labeling using majority-vote ensemble models

Fine-tuned models trained under different agreement criteria

🔎 Similar Papers

No similar papers found.

Authors to Follow