ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional speech restoration (SR) methods often over-suppress reverberation, leading to loss of spatial source information. To address this, we propose a generative SR framework enabling controllable reverberation modeling—preserving, editing, and synthesizing reverberation while achieving high-fidelity denoising. Our method introduces the first joint architecture combining a reverberation-feature disentanglement encoder (ReverbEncoder) and a conditional neural vocoder, augmented with a random zero-vector replacement training strategy. This enables latent-space operations including reverberation interpolation, replacement, and generative control. Objective and subjective evaluations demonstrate that our approach significantly outperforms both two-stage SR and RIR-simulation-based convolutional baselines in denoising fidelity, reverberation preservation, and novel reverberation synthesis. To our knowledge, this is the first work to explicitly model and flexibly control reverberation within speech restoration.

Technology Category

Application Category

📝 Abstract
Reverberation encodes spatial information regarding the acoustic source environment, yet traditional Speech Restoration (SR) usually completely removes reverberation. We propose ReverbMiipher, an SR model extending parametric resynthesis framework, designed to denoise speech while preserving and enabling control over reverberation. ReverbMiipher incorporates a dedicated ReverbEncoder to extract a reverb feature vector from noisy input. This feature conditions a vocoder to reconstruct the speech signal, removing noise while retaining the original reverberation characteristics. A stochastic zero-vector replacement strategy during training ensures the feature specifically encodes reverberation, disentangling it from other speech attributes. This learned representation facilitates reverberation control via techniques such as interpolation between features, replacement with features from other utterances, or sampling from a latent space. Objective and subjective evaluations confirm ReverbMiipher effectively preserves reverberation, removes other artifacts, and outperforms the conventional two-stage SR and convolving simulated room impulse response approach. We further demonstrate its ability to generate novel reverberation effects through feature manipulation.
Problem

Research questions and friction points this paper is trying to address.

Denoise speech while preserving reverberation characteristics
Enable control over reverberation via feature manipulation
Disentangle reverberation from other speech attributes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends parametric resynthesis for controlled reverberation preservation
Uses ReverbEncoder to extract and condition reverb features
Enables reverberation control via feature manipulation techniques
🔎 Similar Papers