Array2BR: An End-to-End Noise-immune Binaural Audio Synthesis from Microphone-array Signals

📅 2024-10-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing binaural audio synthesis methods for remote conferencing are limited by either single-microphone setups lacking spatial cues or microphone arrays relying on high-accuracy direction-of-arrival (DOA) estimation. This paper proposes the first noise-robust, end-to-end array-to-binaural (Array2BR) mapping framework that directly synthesizes high-fidelity, low-noise binaural audio from multi-channel array signals. Our approach jointly models interaural time/level differences (ITD/ILD) and noise suppression in a unified manner—without requiring source separation or post-processing. It integrates beamforming priors with binaural transfer function constraints, employs time-frequency joint deep modeling, and optimizes via multi-scale loss functions. Experiments demonstrate significant improvements over state-of-the-art methods: +1.22 PESQ, +4.7% STOI, and +0.8 MOS; achieves 18.2 dB noise reduction; and reduces ITD/ILD estimation errors by 37%.

Technology Category

Application Category

📝 Abstract

Telepresence technology aims to provide an immersive virtual presence for remote conference applications, and it is extremely important to synthesize high-quality binaural audio signals for this aim. Because the ambient noise is often inevitable in practical application scenarios, it is highly desired that binaural audio signals without noise can be obtained from microphone-array signals directly. For this purpose, this paper proposes a new end-to-end noise-immune binaural audio synthesis framework from microphone-array signals, abbreviated as Array2BR, and experimental results show that binaural cues can be correctly mapped and noise can be well suppressed simultaneously using the proposed framework. Compared with existing methods, the proposed method achieved better performance in terms of both objective and subjective metric scores.

Problem

Research questions and friction points this paper is trying to address.

Enhancing remote conferencing with clear and spatial audio

Overcoming limitations in existing speaker extraction methods

Improving accuracy of spatial rendering in binaural speech

Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end deep learning framework for speech

Unifies extraction, noise suppression, binaural rendering

Magnitude-weighted ILD loss improves spatial accuracy

🔎 Similar Papers

Performance and Robustness of Signal-Dependent vs. Signal-Independent Binaural Signal Matching with Wearable Microphone Arrays

2024-09-18Citations: 0

Authors to Follow