Enhancing Depression Detection via Question-wise Modality Fusion

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Depression automatic screening faces two key bottlenecks: reliance on manual questionnaire administration and inadequate modeling of question-level modality contribution disparities and ordinal label characteristics in multimodal fusion. To address these, we propose a fine-grained, question-level multimodal fusion framework. Our method introduces a question-level dynamic fusion mechanism that adaptively weights acoustic, textual, and visual features per question, and incorporates an Imbalanced Ordinal Logarithmic Loss (ImbOLL) to jointly model the inherent ordering of depression severity levels and class imbalance. Evaluated on the E-DAIC-WOZ dataset, our approach achieves state-of-the-art performance. It enables interpretable, question-wise depression severity scoring—facilitating granular clinical assessment—and significantly enhances decision-support capability for mental health practitioners. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Depression is a highly prevalent and disabling condition that incurs substantial personal and societal costs. Current depression diagnosis involves determining the depression severity of a person through self-reported questionnaires or interviews conducted by clinicians. This often leads to delayed treatment and involves substantial human resources. Thus, several works try to automate the process using multimodal data. However, they usually overlook the following: i) The variable contribution of each modality for each question in the questionnaire and ii) Using ordinal classification for the task. This results in sub-optimal fusion and training methods. In this work, we propose a novel Question-wise Modality Fusion (QuestMF) framework trained with a novel Imbalanced Ordinal Log-Loss (ImbOLL) function to tackle these issues. The performance of our framework is comparable to the current state-of-the-art models on the E-DAIC dataset and enhances interpretability by predicting scores for each question. This will help clinicians identify an individual's symptoms, allowing them to customise their interventions accordingly. We also make the code for the QuestMF framework publicly available.

Problem

Research questions and friction points this paper is trying to address.

Automate depression detection using multimodal data fusion

Address variable modality contributions per questionnaire question

Improve interpretability for customized clinical interventions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Question-wise Modality Fusion for depression detection

Imbalanced Ordinal Log-Loss function for training

Interpretable per-question score prediction

🔎 Similar Papers

No similar papers found.