A Frequency-aware Augmentation Network for Mental Disorders Assessment from Audio

📅 2025-01-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses depression severity assessment and automatic ADHD detection using speech. We propose a Frequency-aware Dynamic Convolutional Network (FDCN) that takes spectrograms as input and employs a decoupled modeling strategy: multi-scale convolutional filters capture discriminative frequency bands, input-agnostic dynamic convolutions model time-adaptive acoustic patterns, and a spectrogram feature enhancement module improves representation robustness. Our key contribution is the first joint optimization of frequency selectivity and temporal dynamics for fine-grained modeling of psychiatric disorder–related acoustic cues. Evaluated on the AVEC 2014 dataset, FDCN achieves an RMSE of 9.23 for depression severity estimation. On a proprietary ADHD dataset, it attains 89.8% classification accuracy—significantly outperforming existing speech-based mental health assessment methods.

Technology Category

Application Category

📝 Abstract
Depression and Attention Deficit Hyperactivity Disorder (ADHD) stand out as the common mental health challenges today. In affective computing, speech signals serve as effective biomarkers for mental disorder assessment. Current research, relying on labor-intensive hand-crafted features or simplistic time-frequency representations, often overlooks critical details by not accounting for the differential impacts of various frequency bands and temporal fluctuations. Therefore, we propose a frequency-aware augmentation network with dynamic convolution for depression and ADHD assessment. In the proposed method, the spectrogram is used as the input feature and adopts a multi-scale convolution to help the network focus on discriminative frequency bands related to mental disorders. A dynamic convolution is also designed to aggregate multiple convolution kernels dynamically based upon their attentions which are input-independent to capture dynamic information. Finally, a feature augmentation block is proposed to enhance the feature representation ability and make full use of the captured information. Experimental results on AVEC 2014 and self-recorded ADHD dataset prove the robustness of our method, an RMSE of 9.23 was attained for estimating depression severity, along with an accuracy of 89.8% in detecting ADHD.
Problem

Research questions and friction points this paper is trying to address.

Speech Analysis
Mental Health
Depression and ADHD Detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforced Neural Network
Dynamic Convolution Technique
Mental Health Assessment
S
Shuanglin Li
Intelligent Sensing and Communications Research Group, Newcastle University, U.K.
S
Siyang Song
School of Computing and Mathematical Sciences, University of Leicester, U.K.
R
Rajesh Nair
Cumbria, Northumberland, Tyne and Wear, CNTW-NHS Foundation Trust, U.K.
Syed Mohsen Naqvi
Syed Mohsen Naqvi
School of Engineering, Newcastle University, United Kingdom
Multimodal (multi-sensor) Signal and Information Processing