No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional personalized speech intelligibility prediction relies on inaccurate pure-tone audiograms, which poorly reflect actual speech understanding ability. This work abandons the audiogram-based paradigm and instead proposes the first approach that leverages users’ historical speech intelligibility scores for personalization. We introduce SSIPNet, a sample-driven framework that integrates semantic representations from a pre-trained speech foundation model and employs contrastive learning combined with meta-learning to enable few-shot cross-audio prediction. With only 3–5 support samples (audio clip, intelligibility score pairs), SSIPNet achieves significant improvements over audiogram-based baselines on the Clarity Prediction Challenge dataset—reducing mean prediction error by 28.6%. This establishes a novel, low-resource paradigm for personalized speech intelligibility assessment.

Technology Category

Application Category

📝 Abstract
Personalized speech intelligibility prediction is challenging. Previous approaches have mainly relied on audiograms, which are inherently limited in accuracy as they only capture a listener's hearing threshold for pure tones. Rather than incorporating additional listener features, we propose a novel approach that leverages an individual's existing intelligibility data to predict their performance on new audio. We introduce the Support Sample-Based Intelligibility Prediction Network (SSIPNet), a deep learning model that leverages speech foundation models to build a high-dimensional representation of a listener's speech recognition ability from multiple support (audio, score) pairs, enabling accurate predictions for unseen audio. Results on the Clarity Prediction Challenge dataset show that, even with a small number of support (audio, score) pairs, our method outperforms audiogram-based predictions. Our work presents a new paradigm for personalized speech intelligibility prediction.
Problem

Research questions and friction points this paper is trying to address.

Predict speech intelligibility without audiograms
Use existing listener scores for personalized prediction
Outperform audiogram-based methods with limited data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages existing intelligibility data for predictions
Uses deep learning model SSIPNet for high-dimensional representation
Outperforms audiogram-based methods with few support pairs
🔎 Similar Papers
No similar papers found.
H
Haoshuai Zhou
Orka Labs Inc., China
C
Changgeng Mo
Orka Labs Inc., China
Boxuan Cao
Boxuan Cao
Orka Lab Inc.
BiomedicalDeep LearningArtificial Intelligence
Linkai Li
Linkai Li
Head of Engineering, Orka Inc
Signal ProcessingSpeech EnhancementBiomedical Optics
S
Shan Xiang Wang
Materials Science and Engineering, Stanford University, United States