Enhancing Depression Detection via Question-wise Modality Fusion

📅 2025-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Depression automatic screening faces two key bottlenecks: reliance on manual questionnaire administration and inadequate modeling of question-level modality contribution disparities and ordinal label characteristics in multimodal fusion. To address these, we propose a fine-grained, question-level multimodal fusion framework. Our method introduces a question-level dynamic fusion mechanism that adaptively weights acoustic, textual, and visual features per question, and incorporates an Imbalanced Ordinal Logarithmic Loss (ImbOLL) to jointly model the inherent ordering of depression severity levels and class imbalance. Evaluated on the E-DAIC-WOZ dataset, our approach achieves state-of-the-art performance. It enables interpretable, question-wise depression severity scoring—facilitating granular clinical assessment—and significantly enhances decision-support capability for mental health practitioners. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Depression is a highly prevalent and disabling condition that incurs substantial personal and societal costs. Current depression diagnosis involves determining the depression severity of a person through self-reported questionnaires or interviews conducted by clinicians. This often leads to delayed treatment and involves substantial human resources. Thus, several works try to automate the process using multimodal data. However, they usually overlook the following: i) The variable contribution of each modality for each question in the questionnaire and ii) Using ordinal classification for the task. This results in sub-optimal fusion and training methods. In this work, we propose a novel Question-wise Modality Fusion (QuestMF) framework trained with a novel Imbalanced Ordinal Log-Loss (ImbOLL) function to tackle these issues. The performance of our framework is comparable to the current state-of-the-art models on the E-DAIC dataset and enhances interpretability by predicting scores for each question. This will help clinicians identify an individual's symptoms, allowing them to customise their interventions accordingly. We also make the code for the QuestMF framework publicly available.
Problem

Research questions and friction points this paper is trying to address.

Automate depression detection using multimodal data fusion
Address variable modality contributions per questionnaire question
Improve interpretability for customized clinical interventions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Question-wise Modality Fusion for depression detection
Imbalanced Ordinal Log-Loss function for training
Interpretable per-question score prediction
🔎 Similar Papers
No similar papers found.