Towards Explainable Multimodal Depression Recognition for Clinical Interviews

πŸ“… 2025-01-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the limited interpretability of depression detection in clinical interviews, this paper proposes the Explainable Multimodal Depression Recognition Challenge (EMDRC), the first work to explicitly incorporate the PHQ-8 scale structure into multimodal modeling. Methodologically, we design a PHQ-8-guided multi-task learning framework that jointly models speech, text, and visual modalities to achieve symptom-level semantic alignment and generate dialogue-level interpretable summaries. A symptom-oriented hierarchical summarization mechanism is introduced to jointly optimize symptom summarization and depression severity classification. Evaluated on a newly constructed, clinically annotated interview dataset, our approach significantly outperforms existing baselines in both symptom summarization accuracy and depression severity classification F1-score. The method delivers transparent, structured, and PHQ-8–consistent decision support for clinicians.

Technology Category

Application Category

πŸ“ Abstract
Recently, multimodal depression recognition for clinical interviews (MDRC) has recently attracted considerable attention. Existing MDRC studies mainly focus on improving task performance and have achieved significant development. However, for clinical applications, model transparency is critical, and previous works ignore the interpretability of decision-making processes. To address this issue, we propose an Explainable Multimodal Depression Recognition for Clinical Interviews (EMDRC) task, which aims to provide evidence for depression recognition by summarizing symptoms and uncovering underlying causes. Given an interviewer-participant interaction scenario, the goal of EMDRC is to structured summarize participant's symptoms based on the eight-item Patient Health Questionnaire depression scale (PHQ-8), and predict their depression severity. To tackle the EMDRC task, we construct a new dataset based on an existing MDRC dataset. Moreover, we utilize the PHQ-8 and propose a PHQ-aware multimodal multi-task learning framework, which captures the utterance-level symptom-related semantic information to help generate dialogue-level summary. Experiment results on our annotated dataset demonstrate the superiority of our proposed methods over baseline systems on the EMDRC task.
Problem

Research questions and friction points this paper is trying to address.

Interpretable Multimodal Depression Recognition
Clinical Interviews
Symptom Summary and Causal Explanation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable Multimodal Depression Recognition
PHQ-aware Multi-task Learning
Symptom-related Semantic Information
πŸ”Ž Similar Papers
No similar papers found.
W
Wenjie Zheng
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
Qiming Xie
Qiming Xie
Nanjing University of Science and Technology
Natural Language Processing
Zengzhi Wang
Zengzhi Wang
Shanghai Jiao Tong University
Data EngineeringComplex ReasoningLarge Language ModelsNatural Language Processing
Jianfei Yu
Jianfei Yu
Singapore Management University
Natural Language ProcessingText MiningMachine Learning
R
Rui Xia
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.