A Comparison of Generative and Discriminative Methods for Speech Enhancement: Robustness, Complexity, and Hallucination

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

226K/year
🤖 AI Summary
This study systematically compares generative and discriminative deep learning approaches for speech enhancement across varying signal-to-noise ratios, training data match conditions, and dataset scales. Through comprehensive evaluation of denoising performance, convergence speed, computational complexity, and speech hallucination—quantified via word error rate and phoneme similarity—the work reveals, for the first time, a multidimensional trade-off between robustness, efficiency, and perceptual quality. Generative models demonstrate superior perceptual quality but incur higher computational costs and greater hallucination risk compared to their discriminative counterparts. These findings provide empirical evidence and practical guidelines for selecting appropriate methods in real-world deployment scenarios.
📝 Abstract
In this study, we conduct a comprehensive comparative analysis of generative and discriminative deep learning-based speech enhancement methods, specifically in noise reduction tasks. Our investigation focuses on evaluating their effectiveness under high and low signal-to-noise ratio conditions, considering both matched and mismatched training scenarios. We further investigate the impact of training data volume, model convergence speed, and interpret the performance differences in terms of objective results for the considered training paradigms. Additionally, we compare the complexity-performance trade-off and the practical viability of these approaches. To further strengthen the evaluation, we study the hallucination characteristics of generative approaches in terms of word error rate and phoneme similarity. The insights derived from this study provide empirical evidence to assist researchers and practitioners in understanding whether the perceptual gains of different approaches justify their computational cost in practical applications.
Problem

Research questions and friction points this paper is trying to address.

speech enhancement
generative methods
discriminative methods
hallucination
noise reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

speech enhancement
generative vs discriminative
hallucination
robustness
complexity-performance trade-off