Read What You Hear: Reference-Free Hypotheses Evaluation with Acoustic Discrepancy

πŸ“… 2026-06-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

174K/year
πŸ€– AI Summary
Current automatic speech recognition (ASR) systems rely on reference transcriptions for evaluation, lacking effective reference-free methods to assess recognition hypotheses. This work proposes the READ metric, which introduces acoustic consistency as a core principle for reference-free ASR hypothesis evaluation. Specifically, READ leverages a pretrained autoregressive text-to-speech (TTS) model to compute the conditional likelihood of speech tokens given a hypothesized transcript, thereby quantifying fine-grained acoustic-textual mismatches. Without requiring any additional training, the method enables effective hypothesis refinement across diverse noise conditions, demonstrating strong correlation with recognition errors and achieving up to a 20% relative reduction in word error rate.
πŸ“ Abstract
Automatic speech recognition systems commonly rely on reference transcriptions for evaluation, while reference-free approaches often depend on internal confidence estimation or auxiliary language models. We propose READ (Reference-free Hypothesis Evaluation with Acoustic Discrepancy), a novel metric that evaluates ASR hypotheses directly from the speech signal. READ emphasizes the acoustic grounding of hypotheses. It uses a pretrained auto-regressive TTS model to compute the conditional likelihood of speech tokens given a text hypothesis, to measure fine-grained acoustic discrepancy between speech and text. Without additional training, READ can be applied for hypothesis refinement. Experiments show that READ correlates with specific recognition errors and improves ASR outputs, achieving up to 20\% relative error rate reduction, with particularly strong gains under noisy conditions.
Problem

Research questions and friction points this paper is trying to address.

reference-free evaluation
automatic speech recognition
acoustic discrepancy
hypothesis evaluation
speech-text alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

reference-free evaluation
acoustic discrepancy
automatic speech recognition
text-to-speech
hypothesis refinement
πŸ”Ž Similar Papers
No similar papers found.