No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Traditional personalized speech intelligibility prediction relies on inaccurate pure-tone audiograms, which poorly reflect actual speech understanding ability. This work abandons the audiogram-based paradigm and instead proposes the first approach that leverages users’ historical speech intelligibility scores for personalization. We introduce SSIPNet, a sample-driven framework that integrates semantic representations from a pre-trained speech foundation model and employs contrastive learning combined with meta-learning to enable few-shot cross-audio prediction. With only 3–5 support samples (audio clip, intelligibility score pairs), SSIPNet achieves significant improvements over audiogram-based baselines on the Clarity Prediction Challenge dataset—reducing mean prediction error by 28.6%. This establishes a novel, low-resource paradigm for personalized speech intelligibility assessment.

Technology Category

Application Category

📝 Abstract

Personalized speech intelligibility prediction is challenging. Previous approaches have mainly relied on audiograms, which are inherently limited in accuracy as they only capture a listener's hearing threshold for pure tones. Rather than incorporating additional listener features, we propose a novel approach that leverages an individual's existing intelligibility data to predict their performance on new audio. We introduce the Support Sample-Based Intelligibility Prediction Network (SSIPNet), a deep learning model that leverages speech foundation models to build a high-dimensional representation of a listener's speech recognition ability from multiple support (audio, score) pairs, enabling accurate predictions for unseen audio. Results on the Clarity Prediction Challenge dataset show that, even with a small number of support (audio, score) pairs, our method outperforms audiogram-based predictions. Our work presents a new paradigm for personalized speech intelligibility prediction.

Problem

Research questions and friction points this paper is trying to address.

Predict speech intelligibility without audiograms

Use existing listener scores for personalized prediction

Outperform audiogram-based methods with limited data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages existing intelligibility data for predictions

Uses deep learning model SSIPNet for high-dimensional representation

Outperforms audiogram-based methods with few support pairs

🔎 Similar Papers

No similar papers found.

Authors to Follow