Multi-Speaker DOA Estimation in Binaural Hearing Aids using Deep Learning and Speaker Count Fusion

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address insufficient direction-of-arrival (DOA) estimation accuracy for binaural hearing aids in multi-speaker noisy environments, this paper proposes a CRNN-based model that jointly models inter-channel phase differences and magnitude ratios, incorporating ground-truth source count as auxiliary structural prior. We systematically evaluate early, middle, and late fusion strategies and find that late fusion leveraging the true number of active sources yields the greatest improvement—boosting F1-score by 14% over the baseline. Although joint training for DOA estimation and source counting does not further enhance DOA accuracy, it significantly improves source count estimation. This work is the first to empirically validate the efficacy of source cardinality as a structured prior for multi-speaker DOA estimation, offering a novel pathway toward robust speech separation tailored for hearing aid applications.

Technology Category

Application Category

📝 Abstract

For extracting a target speaker voice, direction-of-arrival (DOA) estimation is crucial for binaural hearing aids operating in noisy, multi-speaker environments. Among the solutions developed for this task, a deep learning convolutional recurrent neural network (CRNN) model leveraging spectral phase differences and magnitude ratios between microphone signals is a popular option. In this paper, we explore adding source-count information for multi-sources DOA estimation. The use of dual-task training with joint multi-sources DOA estimation and source counting is first considered. We then consider using the source count as an auxiliary feature in a standalone DOA estimation system, where the number of active sources (0, 1, or 2+) is integrated into the CRNN architecture through early, mid, and late fusion strategies. Experiments using real binaural recordings are performed. Results show that the dual-task training does not improve DOA estimation performance, although it benefits source-count prediction. However, a ground-truth (oracle) source count used as an auxiliary feature significantly enhances standalone DOA estimation performance, with late fusion yielding up to 14% higher average F1-scores over the baseline CRNN. This highlights the potential of using source-count estimation for robust DOA estimation in binaural hearing aids.

Problem

Research questions and friction points this paper is trying to address.

Estimating speaker directions in noisy multi-speaker environments for hearing aids

Improving DOA estimation using source-count information fusion strategies

Enhancing binaural hearing aid performance through deep learning approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using deep learning CRNN for DOA estimation

Integrating source count as auxiliary feature

Late fusion strategy improves F1-score by 14%

🔎 Similar Papers

A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation

2024-09-19arXiv.orgCitations: 0

Authors to Follow