Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional PHQ-9 screening tools lack interactivity and adaptability. This study introduces HopeBot—the first LLM-driven depression screening chatbot integrating voice interaction, retrieval-augmented generation (RAG), and real-time clarification mechanisms—to transform the structured PHQ-9 assessment into a dynamic, supportive, multi-turn dialogue. Its key innovation lies in the first application of combined RAG and real-time clarification for mental health screening, explicitly designed to enhance cultural adaptability and sensitive-topic responsiveness. Evaluated with 132 cross-cultural participants, HopeBot demonstrated high agreement with standard PHQ-9 scores (ICC = 0.91); 87.1% of users indicated willingness to reuse or recommend it; 71% perceived it as more trustworthy than conventional tools; and average user comfort reached 8.4/10. Results indicate that HopeBot significantly improves screening accessibility, user trust, and overall experience while maintaining clinical validity.

Technology Category

Application Category

📝 Abstract
Static tools like the Patient Health Questionnaire-9 (PHQ-9) effectively screen depression but lack interactivity and adaptability. We developed HopeBot, a chatbot powered by a large language model (LLM) that administers the PHQ-9 using retrieval-augmented generation and real-time clarification. In a within-subject study, 132 adults in the United Kingdom and China completed both self-administered and chatbot versions. Scores demonstrated strong agreement (ICC = 0.91; 45% identical). Among 75 participants providing comparative feedback, 71% reported greater trust in the chatbot, highlighting clearer structure, interpretive guidance, and a supportive tone. Mean ratings (0-10) were 8.4 for comfort, 7.7 for voice clarity, 7.6 for handling sensitive topics, and 7.4 for recommendation helpfulness; the latter varied significantly by employment status and prior mental-health service use (p < 0.05). Overall, 87.1% expressed willingness to reuse or recommend HopeBot. These findings demonstrate voice-based LLM chatbots can feasibly serve as scalable, low-burden adjuncts for routine depression screening.
Problem

Research questions and friction points this paper is trying to address.

Enhancing depression screening with interactive LLM-based chatbot
Improving PHQ-9 adaptability and user trust via real-time clarification
Evaluating chatbot effectiveness for scalable mental health screening
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based chatbot for interactive PHQ-9 screening
Retrieval-augmented generation for real-time clarification
Voice-based scalable depression screening solution
🔎 Similar Papers
No similar papers found.
Z
Zhijun Guo
A
Alvina Lai
Julia Ive
Julia Ive
University College London
A
Alexandru Petcu
Y
Yutong Wang
L
Luyuan Qi
J
Johan H Thygesen
K
Kezhi Li