Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study evaluates the clinical viability of large language models (LLMs) for suicide prevention, focusing on two core tasks: identifying implicit suicidal ideation (IIS) and generating psychologically appropriate supportive responses (PAS). Method: We constructed a high-quality, theory-grounded psycholinguistic test set of 1,308 instances, informed by the Death/Suicide Implicit Association Test (D/S-IAT) and negative automatic thought theory. We introduced the first IIS-PAS dual-dimensional evaluation framework, employing psychometrically grounded prompt engineering, multi-turn context-sensitive assessment, human-in-the-loop annotation, and cross-model consistency analysis. Contribution/Results: Across eight state-of-the-art LLMs, average IIS detection accuracy remained below 40%, and over 60% of PAS responses were rated as risky or clinically inappropriate. These findings expose critical limitations in current LLMs’ capacity for safe, reliable clinical psychological intervention, establishing essential benchmarking standards and actionable pathways for developing trustworthy AI-driven mental health applications.

Technology Category

Application Category

📝 Abstract

We present a comprehensive evaluation framework for assessing Large Language Models' (LLMs) capabilities in suicide prevention, focusing on two critical aspects: the Identification of Implicit Suicidal ideation (IIS) and the Provision of Appropriate Supportive responses (PAS). We introduce ourdata, a novel dataset of 1,308 test cases built upon psychological frameworks including D/S-IAT and Negative Automatic Thinking, alongside real-world scenarios. Through extensive experiments with 8 widely used LLMs under different contextual settings, we find that current models struggle significantly with detecting implicit suicidal ideation and providing appropriate support, highlighting crucial limitations in applying LLMs to mental health contexts. Our findings underscore the need for more sophisticated approaches in developing and evaluating LLMs for sensitive psychological applications.

Problem

Research questions and friction points this paper is trying to address.

Evaluate LLMs' ability to detect implicit suicidal ideation

Assess LLMs' capacity to provide supportive responses

Highlight limitations in LLMs for mental health applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models evaluation

Implicit Suicidal Ideation detection

Psychological frameworks dataset

🔎 Similar Papers

No similar papers found.

Authors to Follow