🤖 AI Summary
Spatial aliasing in spaced microphone arrays causes directional ambiguity at high frequencies, severely degrading the spatial resolution and spectral fidelity of beamforming. To address this, we propose a U-Net–based signal-adaptive anti-aliasing filtering method—the first to learn signal-dependent, multichannel anti-aliasing filters via U-Net. Our architecture incorporates two parallel filtering branches: one models single-channel time-frequency characteristics, while the other explicitly encodes inter-channel spatial correlations. We further employ deep supervision and joint modeling of stereo and first-order Ambisonics representations. Experiments demonstrate that the proposed method significantly outperforms conventional beamformers in both objective metrics (e.g., SDR, SIR) and subjective listening quality. It effectively suppresses aliasing artifacts, improves spatial localization accuracy, and enhances spectral reconstruction fidelity across broadband frequencies.
📝 Abstract
Spatial aliasing affects spaced microphone arrays, causing directional ambiguity above certain frequencies, degrading spatial and spectral accuracy of beamformers. Given the limitations of conventional signal processing and the scarcity of deep learning approaches to spatial aliasing mitigation, we propose a novel approach using a U-Net architecture to predict a signal-dependent de-aliasing filter, which reduces aliasing in conventional beamforming for spatial capture. Two types of multichannel filters are considered, one which treats the channels independently and a second one that models cross-channel dependencies. The proposed approach is evaluated in two common spatial capture scenarios: stereo and first-order Ambisonics. The results indicate a very significant improvement, both objective and perceptual, with respect to conventional beamforming. This work shows the potential of deep learning to reduce aliasing in beamforming, leading to improvements in multi-microphone setups.