Perceptual Implications of Automatic Anonymization in Pathological Speech

📅 2025-05-01
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Automatic speech anonymization lacks empirical validation of perceptual fidelity for pathological speech, hindering its ethical use in clinical data sharing. Method: We conducted a human–machine Turing-style auditory experiment involving five pathological groups (e.g., cleft lip/palate, articulation disorders) and healthy controls, employing state-of-the-art anonymization models (EER 30–40%). We assessed perception via zero-/few-shot identification, repeated-measures ANOVA, and cross-group comparisons using diverse listener backgrounds. Contribution/Results: While anonymized speech retained high intelligibility (91–93%), perceptual quality dropped significantly (83% → 59%, *p* < 0.001). Degradation varied across disorder types (*p* = 0.005), and the native listener advantage vanished post-anonymization. Crucially, automated privacy metrics showed no correlation with human perception, revealing pathology-specific degradation patterns. These findings provide the first empirical basis for establishing perceptually grounded standards for pathological speech anonymization.

Technology Category

Application Category

📝 Abstract
Automatic anonymization techniques are essential for ethical sharing of pathological speech data, yet their perceptual consequences remain understudied. This study presents the first comprehensive human-centered analysis of anonymized pathological speech, using a structured perceptual protocol involving ten native and non-native German listeners with diverse linguistic, clinical, and technical backgrounds. Listeners evaluated anonymized-original utterance pairs from 180 speakers spanning Cleft Lip and Palate, Dysarthria, Dysglossia, Dysphonia, and age-matched healthy controls. Speech was anonymized using state-of-the-art automatic methods (equal error rates in the range of 30-40%). Listeners completed Turing-style discrimination and quality rating tasks under zero-shot (single-exposure) and few-shot (repeated-exposure) conditions. Discrimination accuracy was high overall (91% zero-shot; 93% few-shot), but varied by disorder (repeated-measures ANOVA: p=0.007), ranging from 96% (Dysarthria) to 86% (Dysphonia). Anonymization consistently reduced perceived quality (from 83% to 59%, p<0.001), with pathology-specific degradation patterns (one-way ANOVA: p=0.005). Native listeners rated original speech slightly higher than non-native listeners (Delta=4%, p=0.199), but this difference nearly disappeared after anonymization (Delta=1%, p=0.724). No significant gender-based bias was observed. Critically, human perceptual outcomes did not correlate with automatic privacy or clinical utility metrics. These results underscore the need for listener-informed, disorder- and context-specific anonymization strategies that preserve privacy while maintaining interpretability, communicative functions, and diagnostic utility, especially for vulnerable populations such as children.
Problem

Research questions and friction points this paper is trying to address.

Evaluates perceptual effects of anonymizing pathological speech data
Assesses listener discrimination accuracy across various speech disorders
Examines quality degradation patterns in anonymized pathological speech
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-centered analysis of anonymized pathological speech
Turing-style discrimination and quality rating tasks
Listener-informed disorder-specific anonymization strategies
🔎 Similar Papers
No similar papers found.
Soroosh Tayebi Arasteh
Soroosh Tayebi Arasteh
RWTH Aachen University
Deep LearningAI in MedicineGenerative AIMedical Image Analysis
S
Saba Afza
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
T
Tri-Thien Nguyen
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
L
Lukas Buess
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
M
Maryam Parvin
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
T
TomĂĄs Arias-Vergara
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
P
P. A. PĂŠrez-Toro
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
H
Hiuching Hung
Department of Foreign Language Education, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
Mahshad Lotfinia
Mahshad Lotfinia
RWTH Aachen University
Artificial IntelligenceDeep LearningMedical Image Analysis
T
Thomas Gorges
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
E
Elmar Noeth
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
M
Maria Schuster
Department of Otorhinolaryngology, Head and Neck Surgery, Ludwig-Maximilians-Universität Mßnchen, Munich, Germany
S
Seung Hee Yang
Speech & Language Processing Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
A
Andreas Maier
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany