A Frequency-aware Augmentation Network for Mental Disorders Assessment from Audio

📅 2025-01-05

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study addresses depression severity assessment and automatic ADHD detection using speech. We propose a Frequency-aware Dynamic Convolutional Network (FDCN) that takes spectrograms as input and employs a decoupled modeling strategy: multi-scale convolutional filters capture discriminative frequency bands, input-agnostic dynamic convolutions model time-adaptive acoustic patterns, and a spectrogram feature enhancement module improves representation robustness. Our key contribution is the first joint optimization of frequency selectivity and temporal dynamics for fine-grained modeling of psychiatric disorder–related acoustic cues. Evaluated on the AVEC 2014 dataset, FDCN achieves an RMSE of 9.23 for depression severity estimation. On a proprietary ADHD dataset, it attains 89.8% classification accuracy—significantly outperforming existing speech-based mental health assessment methods.

Technology Category

Application Category

📝 Abstract

Depression and Attention Deficit Hyperactivity Disorder (ADHD) stand out as the common mental health challenges today. In affective computing, speech signals serve as effective biomarkers for mental disorder assessment. Current research, relying on labor-intensive hand-crafted features or simplistic time-frequency representations, often overlooks critical details by not accounting for the differential impacts of various frequency bands and temporal fluctuations. Therefore, we propose a frequency-aware augmentation network with dynamic convolution for depression and ADHD assessment. In the proposed method, the spectrogram is used as the input feature and adopts a multi-scale convolution to help the network focus on discriminative frequency bands related to mental disorders. A dynamic convolution is also designed to aggregate multiple convolution kernels dynamically based upon their attentions which are input-independent to capture dynamic information. Finally, a feature augmentation block is proposed to enhance the feature representation ability and make full use of the captured information. Experimental results on AVEC 2014 and self-recorded ADHD dataset prove the robustness of our method, an RMSE of 9.23 was attained for estimating depression severity, along with an accuracy of 89.8% in detecting ADHD.

Problem

Research questions and friction points this paper is trying to address.

Speech Analysis

Mental Health

Depression and ADHD Detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforced Neural Network

Dynamic Convolution Technique

Mental Health Assessment

🔎 Similar Papers

A Frame-based Attention Interpretation Method for Relevant Acoustic Feature Extraction in Long Speech Depression Detection