Fast-SDE: Efficient Single-Microphone Sound Source Distance Estimation in Reverberant Environments

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of accurate and efficient sound source distance estimation using a single microphone in reverberant environments. The authors propose Fast-SDE, a novel framework that introduces a subband architecture—decomposing the spectrum into multiple frequency bands, employing a shared encoder to extract compact time-frequency representations, and utilizing a lightweight regression head for efficient distance prediction. By integrating these components, Fast-SDE achieves a favorable balance between estimation accuracy and computational efficiency. The method demonstrates high performance in both simulated and real-world scenarios while substantially reducing requirements for hardware synchronization, spatial resources, and computational power, making it well-suited for deployment on resource-constrained robotic platforms.

📝 Abstract

Sound source distance estimation (SDE) is a critical capability in human-robot interaction. An inappropriate interaction distance not only reduces the reliability of speech acquisition and understanding, but also compromises the naturalness and comfort of the interaction. Most existing SDE methods rely on microphone arrays, however, multi-microphone systems typically require careful hardware synchronization, geometric calibration, and additional space and computational resources, which limits applicability to size-constrained and computability-limited embodied platforms. To alleviate these issues, we propose Fast-SDE, a lightweight single-microphone SDE framework that is suited for deployment on robot platforms with limited computational resources and strict size constraints. Specifically, Fast-SDE employs a subband-based backbone that decomposes the frequency axis into multiple subbands, rather than processing the entire spectrum with a wide full-band backbone. A shared subband encoder then maps each subband to a compact latent representation and learns the relationship between acoustic structure and time-frequency patterns. Finally, a lightweight regression head converts the fused subband representations into the estimated distance. Extensive simulation and real-world experiments demonstrate the merits of the proposed method. To benefit the broader research community, we have open-sourced our code at https://github.com/JiangWAV/FAST-SDE.

Problem

Research questions and friction points this paper is trying to address.

sound source distance estimation

single-microphone

reverberant environments

human-robot interaction

computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

single-microphone

subband-based

sound source distance estimation