🤖 AI Summary
Addressing the challenge of simultaneously achieving ultra-low latency (≤5 ms) and high speech intelligibility in hearing aid applications, this paper proposes an end-to-end, multi-stage complex-domain speech enhancement system. Methodologically: (i) an asymmetric time–frequency window is designed to satisfy stringent latency constraints; (ii) speech magnitude and phase are jointly modeled in the complex spectrogram domain, with head-pose information incorporated to assist acoustic source separation; and (iii) a HASPI-guided post-processing module is introduced, specifically tailored to hearing aid gain characteristics. The key contribution lies in the first integration—within an ultra-low-latency architecture—of head-motion signals, complex-spectrogram modeling, and perception-driven post-processing. Evaluated on the ICASSP 2023 Clarity Challenge benchmark, the proposed system achieves significant improvements in HASPI scores, demonstrating its effectiveness and practicality for enhancing perceptual speech quality under challenging noisy conditions.
📝 Abstract
This paper proposes an end-to-end system for the ICASSP 2023 Clarity Challenge. In this work, we introduce four major novelties: (1) a novel multi-stage system in both the magnitude and complex domains to better utilize phase information; (2) an asymmetric window pair to achieve higher frequency resolution with the 5ms latency constraint; (3) the integration of head rotation information and the mixture signals to achieve better enhancement; (4) a post-processing module that achieves higher hearing aid speech perception index (HASPI) scores with the hearing aid amplification stage provided by the baseline system.