Addressing Pitfalls in Auditing Practices of Automatic Speech Recognition Technologies: A Case Study of People with Aphasia

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing ASR fairness audits suffer from three critical limitations: (1) reliance on a single text normalization method, (2) coarse-grained demographic analysis, and (3) overreliance on word error rate (WER) while neglecting generative errors—such as hallucinations. This paper introduces the first multidimensional fairness audit framework tailored to speakers with disabilities, notably aphasia. Our approach innovatively integrates: (1) multi-strategy text normalization to expose normalization-induced bias; (2) fine-grained subgroup analysis jointly modeling demographic and acoustic covariates; and (3) an expanded evaluation metric suite incorporating semantic fidelity and hallucination detection. Empirical audits across six state-of-the-art ASR systems reveal that all exhibit significant performance degradation on aphasic speakers, and conventional methods underestimate error-type diversity and individual variability by over 40%. Our framework increases detection rates of critical fairness issues by more than 40%, establishing a reproducible, interpretable audit paradigm for disability-inclusive speech technology.

Technology Category

Application Category

📝 Abstract

Automatic Speech Recognition (ASR) has transformed daily tasks from video transcription to workplace hiring. ASR systems' growing use warrants robust and standardized auditing approaches to ensure automated transcriptions of high and equitable quality. This is especially critical for people with speech and language disorders (such as aphasia) who may disproportionately depend on ASR systems to navigate everyday life. In this work, we identify three pitfalls in existing standard ASR auditing procedures, and demonstrate how addressing them impacts audit results via a case study of six popular ASR systems' performance for aphasia speakers. First, audits often adhere to a single method of text standardization during data pre-processing, which (a) masks variability in ASR performance from applying different standardization methods, and (b) may not be consistent with how users - especially those from marginalized speech communities - would want their transcriptions to be standardized. Second, audits often display high-level demographic findings without further considering performance disparities among (a) more nuanced demographic subgroups, and (b) relevant covariates capturing acoustic information from the input audio. Third, audits often rely on a single gold-standard metric -- the Word Error Rate -- which does not fully capture the extent of errors arising from generative AI models, such as transcription hallucinations. We propose a more holistic auditing framework that accounts for these three pitfalls, and exemplify its results in our case study, finding consistently worse ASR performance for aphasia speakers relative to a control group. We call on practitioners to implement these robust ASR auditing practices that remain flexible to the rapidly changing ASR landscape.

Problem

Research questions and friction points this paper is trying to address.

Identifies pitfalls in ASR auditing for speech disorders

Highlights lack of nuanced demographic and acoustic analysis

Critiques overreliance on Word Error Rate metric

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple text standardization methods in pre-processing

Detailed demographic and acoustic subgroup analysis

Holistic metrics beyond Word Error Rate

🔎 Similar Papers

No similar papers found.

Authors to Follow