How much to Dereverberate? Low-Latency Single-Channel Speech Enhancement in Distant Microphone Scenarios

📅 2025-04-06
🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of low-latency single-channel speech enhancement in large-volume reverberant environments (e.g., conference rooms, theaters), characterized by far-field acquisition (5–10 m), high room volume (>1000 m³), and long reverberation times (T60 > 1 s). To tackle this, we first systematically demonstrate the feasibility of far-field single-channel speech enhancement. We propose an early-reflection-aware reverberation modeling and suppression strategy, departing from conventional full-reverberation suppression paradigms. A physics-informed random impulse response (RIR) simulation method is designed to explicitly model the coupled dependence of T60 on room volume—critical for realistic training data generation. Furthermore, we develop a lightweight, real-time deep time-frequency masking network. Experiments show substantial improvements: +2.1 in PESQ, +18.3% in STOI, and end-to-end latency <40 ms—achieving significant gains in speech intelligibility and naturalness while meeting strict real-time constraints.

Technology Category

Application Category

📝 Abstract
Dereverberation is an important sub-task of Speech Enhancement (SE) to improve the signal's intelligibility and quality. However, it remains challenging because the reverberation is highly correlated with the signal. Furthermore, the single-channel SE literature has predominantly focused on rooms with short reverb times (typically under 1 second), smaller rooms (under volumes of 1000 cubic meters) and relatively short distances (up to 2 meters). In this paper, we explore real-time low-latency single-channel SE under distant microphone scenarios, such as 5 to 10 meters, and focus on conference rooms and theatres, with larger room dimensions and reverberation times. Such a setup is useful for applications such as lecture demonstrations, drama, and to enhance stage acoustics. First, we show that single-channel SE in such challenging scenarios is feasible. Second, we investigate the relationship between room volume and reverberation time, and demonstrate its importance when randomly simulating room impulse responses. Lastly, we show that for dereverberation with short decay times, preserving early reflections before decaying the transfer function of the room improves overall signal quality.
Problem

Research questions and friction points this paper is trying to address.

Real-time low-latency single-channel speech enhancement in distant microphone scenarios
Exploring dereverberation in large rooms with long reverb times
Balancing early reflections and decay for improved signal quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time low-latency single-channel speech enhancement
Focus on large rooms with long reverberation times
Preserve early reflections for better signal quality
🔎 Similar Papers
No similar papers found.
S
Satvik Venkatesh
L-Acoustics, 67 Southwood Lane, London N65EG.
Philip Coleman
Philip Coleman
L-Acoustics
sound field controlsound zonesspatial audioacousticsaudio signal processing
A
Arthur Benilov
L-Acoustics, 67 Southwood Lane, London N65EG.
Simon Brown
Simon Brown
Consensys
web3cryptocurrency
S
Selim Sheta
L-Acoustics, 67 Southwood Lane, London N65EG.
F
Frederic Roskam
L-Acoustics, 67 Southwood Lane, London N65EG.