How much to Dereverberate? Low-Latency Single-Channel Speech Enhancement in Distant Microphone Scenarios

📅 2025-04-06

🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenging problem of low-latency single-channel speech enhancement in large-volume reverberant environments (e.g., conference rooms, theaters), characterized by far-field acquisition (5–10 m), high room volume (>1000 m³), and long reverberation times (T60 > 1 s). To tackle this, we first systematically demonstrate the feasibility of far-field single-channel speech enhancement. We propose an early-reflection-aware reverberation modeling and suppression strategy, departing from conventional full-reverberation suppression paradigms. A physics-informed random impulse response (RIR) simulation method is designed to explicitly model the coupled dependence of T60 on room volume—critical for realistic training data generation. Furthermore, we develop a lightweight, real-time deep time-frequency masking network. Experiments show substantial improvements: +2.1 in PESQ, +18.3% in STOI, and end-to-end latency <40 ms—achieving significant gains in speech intelligibility and naturalness while meeting strict real-time constraints.

Technology Category

Application Category

📝 Abstract

Dereverberation is an important sub-task of Speech Enhancement (SE) to improve the signal's intelligibility and quality. However, it remains challenging because the reverberation is highly correlated with the signal. Furthermore, the single-channel SE literature has predominantly focused on rooms with short reverb times (typically under 1 second), smaller rooms (under volumes of 1000 cubic meters) and relatively short distances (up to 2 meters). In this paper, we explore real-time low-latency single-channel SE under distant microphone scenarios, such as 5 to 10 meters, and focus on conference rooms and theatres, with larger room dimensions and reverberation times. Such a setup is useful for applications such as lecture demonstrations, drama, and to enhance stage acoustics. First, we show that single-channel SE in such challenging scenarios is feasible. Second, we investigate the relationship between room volume and reverberation time, and demonstrate its importance when randomly simulating room impulse responses. Lastly, we show that for dereverberation with short decay times, preserving early reflections before decaying the transfer function of the room improves overall signal quality.

Problem

Research questions and friction points this paper is trying to address.

Real-time low-latency single-channel speech enhancement in distant microphone scenarios

Exploring dereverberation in large rooms with long reverb times

Balancing early reflections and decay for improved signal quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time low-latency single-channel speech enhancement

Focus on large rooms with long reverberation times

Preserve early reflections for better signal quality

🔎 Similar Papers

No similar papers found.

Authors to Follow