InfoShield: Privacy-Preserving Speech Representations for Mental Health Screening via Information-Theoretic Optimization

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This study addresses the challenge of sensitive demographic information—such as gender and age—being inadvertently leaked through speech representations in depression screening, which hinders clinical deployment. To mitigate this privacy risk, the authors propose a privacy-preserving approach grounded in the information bottleneck principle, which minimizes mutual information between speech embeddings and sensitive attributes while preserving depression classification performance. They introduce a TimeAwareMINE estimator combined with a cross-modal attention mechanism to effectively align time-varying speech frames with time-invariant demographic attributes, thereby resolving feature misalignment issues. Evaluated on the Androids corpus, the method reduces gender and age inference accuracy from 92.6% and 55.7% to 55.5% and 30.3%, respectively, with only a marginal 6% drop in depression classification F1-score, achieving 0.784—outperforming current state-of-the-art approaches.

📝 Abstract

Speech-based mental health screening offers scalable depression detection, yet clinical deployment faces a significant barrier: users' privacy concerns about demographic information exposure. Current techniques struggle to resolve this conflict. Adversarial training often fails against unseen threats, whereas Differential Privacy tends to compromise diagnostic performance by injecting noise across all features. This paper presents InfoShield, which minimizes mutual information between speech representations and sensitive attributes while preserving depression classification accuracy. We identify that standard MINE estimators struggle with sequential speech due to temporal-static misalignment, and introduce TimeAwareMINE with cross-modal attention to align acoustic frames with attribute embeddings. Experiments on the Androids Corpus show InfoShield reduces gender inference from 92.6\% to 55.5\% and age inference from 55.7\% to 30.3\% with limited utility loss (6\% F1 reduction), achieving F1=0.784 compared to prior SOTA's 0.723.

Problem

Research questions and friction points this paper is trying to address.

privacy-preserving

mental health screening

speech representations

sensitive attributes

demographic information

Innovation

Methods, ideas, or system contributions that make the work stand out.

InfoShield

privacy-preserving representation

mutual information minimization