LIWhiz: A Non-Intrusive Lyric Intelligibility Prediction System for the Cadenza Challenge

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the non-intrusive automatic assessment of lyric intelligibility in singing voice. We propose the first end-to-end prediction framework that requires neither reference clean lyrics nor forced alignment information. Our method adapts the Whisper model—previously designed for speech recognition—to lyric intelligibility modeling, leveraging its robust phonetic and prosodic representations. A lightweight, trainable, differentiable regression backend is introduced to ensure stable scoring across diverse singing styles. Crucially, the approach eliminates reliance on time-aligned ground truth, enabling plug-and-play evaluation in unsupervised settings. Evaluated on the Cadenza CLIP test set, our method reduces RMSE by 22.4% relative to the STOI baseline and achieves significantly higher normalized cross-correlation, demonstrating both effectiveness and strong generalization across unseen vocal performances and styles.

Technology Category

Application Category

📝 Abstract
We present LIWhiz, a non-intrusive lyric intelligibility prediction system submitted to the ICASSP 2026 Cadenza Challenge. LIWhiz leverages Whisper for robust feature extraction and a trainable back-end for score prediction. Tested on the Cadenza Lyric Intelligibility Prediction (CLIP) evaluation set, LIWhiz achieves a 22.4% relative root mean squared error reduction over the STOI-based baseline, yielding a substantial improvement in normalized cross-correlation.
Problem

Research questions and friction points this paper is trying to address.

Predicting lyric intelligibility in music non-intrusively
Improving accuracy over baseline STOI methods
Using Whisper and trainable backend for features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Whisper for robust feature extraction
Employs trainable back-end for score prediction
Achieves error reduction over STOI baseline
🔎 Similar Papers
No similar papers found.
R
R. Shekar
Department of Signal Theory, Telematics and Communications, University of Granada, Spain
Iván López-Espejo
Iván López-Espejo
University of Granada
Robust speech recognitionStatistical signal processingSpeaker verificationSpeech enhancement