Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study evaluates the clinical viability of large language models (LLMs) for suicide prevention, focusing on two core tasks: identifying implicit suicidal ideation (IIS) and generating psychologically appropriate supportive responses (PAS). Method: We constructed a high-quality, theory-grounded psycholinguistic test set of 1,308 instances, informed by the Death/Suicide Implicit Association Test (D/S-IAT) and negative automatic thought theory. We introduced the first IIS-PAS dual-dimensional evaluation framework, employing psychometrically grounded prompt engineering, multi-turn context-sensitive assessment, human-in-the-loop annotation, and cross-model consistency analysis. Contribution/Results: Across eight state-of-the-art LLMs, average IIS detection accuracy remained below 40%, and over 60% of PAS responses were rated as risky or clinically inappropriate. These findings expose critical limitations in current LLMs’ capacity for safe, reliable clinical psychological intervention, establishing essential benchmarking standards and actionable pathways for developing trustworthy AI-driven mental health applications.

Technology Category

Application Category

📝 Abstract
We present a comprehensive evaluation framework for assessing Large Language Models' (LLMs) capabilities in suicide prevention, focusing on two critical aspects: the Identification of Implicit Suicidal ideation (IIS) and the Provision of Appropriate Supportive responses (PAS). We introduce ourdata, a novel dataset of 1,308 test cases built upon psychological frameworks including D/S-IAT and Negative Automatic Thinking, alongside real-world scenarios. Through extensive experiments with 8 widely used LLMs under different contextual settings, we find that current models struggle significantly with detecting implicit suicidal ideation and providing appropriate support, highlighting crucial limitations in applying LLMs to mental health contexts. Our findings underscore the need for more sophisticated approaches in developing and evaluating LLMs for sensitive psychological applications.
Problem

Research questions and friction points this paper is trying to address.

Evaluate LLMs' ability to detect implicit suicidal ideation
Assess LLMs' capacity to provide supportive responses
Highlight limitations in LLMs for mental health applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models evaluation
Implicit Suicidal Ideation detection
Psychological frameworks dataset
🔎 Similar Papers
No similar papers found.
T
Tong Li
Provable Responsible AI and Data Analytics (PRADA) Lab, King Abdullah University of Science and Technology, Washington University in St.Louis
S
Shu Yang
Provable Responsible AI and Data Analytics (PRADA) Lab, King Abdullah University of Science and Technology
J
Junchao Wu
University of Macau
J
Jiyao Wei
Institute of Computing Technology, Chinese Academy of Sciences
Lijie Hu
Lijie Hu
Assistant Professor, MBZUAI
Explainable AILLMDifferential Privacy
Mengdi Li
Mengdi Li
King Abdullah University of Science and Technology
Reinforcement LearningLLMsRobotics
Derek F. Wong
Derek F. Wong
Professor, Department of Computer and Information Science, University of Macau
Machine TranslationNeural Machine TranslationNatural Language ProcessingMachine Learning
Joshua R. Oltmanns
Joshua R. Oltmanns
Assistant Professor, Washington University in St. Louis
personalitymental healthAI
D
Di Wang
Provable Responsible AI and Data Analytics (PRADA) Lab, King Abdullah University of Science and Technology