🤖 AI Summary
To address insufficient direction-of-arrival (DOA) estimation accuracy for binaural hearing aids in multi-speaker noisy environments, this paper proposes a CRNN-based model that jointly models inter-channel phase differences and magnitude ratios, incorporating ground-truth source count as auxiliary structural prior. We systematically evaluate early, middle, and late fusion strategies and find that late fusion leveraging the true number of active sources yields the greatest improvement—boosting F1-score by 14% over the baseline. Although joint training for DOA estimation and source counting does not further enhance DOA accuracy, it significantly improves source count estimation. This work is the first to empirically validate the efficacy of source cardinality as a structured prior for multi-speaker DOA estimation, offering a novel pathway toward robust speech separation tailored for hearing aid applications.
📝 Abstract
For extracting a target speaker voice, direction-of-arrival (DOA) estimation is crucial for binaural hearing aids operating in noisy, multi-speaker environments. Among the solutions developed for this task, a deep learning convolutional recurrent neural network (CRNN) model leveraging spectral phase differences and magnitude ratios between microphone signals is a popular option. In this paper, we explore adding source-count information for multi-sources DOA estimation. The use of dual-task training with joint multi-sources DOA estimation and source counting is first considered. We then consider using the source count as an auxiliary feature in a standalone DOA estimation system, where the number of active sources (0, 1, or 2+) is integrated into the CRNN architecture through early, mid, and late fusion strategies. Experiments using real binaural recordings are performed. Results show that the dual-task training does not improve DOA estimation performance, although it benefits source-count prediction. However, a ground-truth (oracle) source count used as an auxiliary feature significantly enhances standalone DOA estimation performance, with late fusion yielding up to 14% higher average F1-scores over the baseline CRNN. This highlights the potential of using source-count estimation for robust DOA estimation in binaural hearing aids.