Audio Source Separation in Reverberant Environments using $β$-divergence based Nonnegative Factorization

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

This work addresses the challenge of multichannel audio source separation in reverberant environments, where performance is often degraded by spatial aliasing and spectral interference. The authors propose a β-divergence-based non-negative (tensor) factorization method that enhances robustness to reverberation by jointly modeling the spectral variance and spatial covariance matrices of source signals, while incorporating adjustable sparsity priors and leveraging a pretrained overcomplete spectral basis dictionary. High-quality separation is achieved through multichannel Wiener filtering informed by the estimated source models. Experimental results demonstrate that the proposed approach significantly outperforms existing algorithms across various reverberant conditions, highlighting the critical role of sparsity priors and β-divergence optimization in improving separation performance.

Technology Category

Application Category

📝 Abstract

In Gaussian model-based multichannel audio source separation, the likelihood of observed mixtures of source signals is parametrized by source spectral variances and by associated spatial covariance matrices. These parameters are estimated by maximizing the likelihood through an Expectation-Maximization algorithm and used to separate the signals by means of multichannel Wiener filtering. We propose to estimate these parameters by applying nonnegative factorization based on prior information on source variances. In the nonnegative factorization, spectral basis matrices can be defined as the prior information. The matrices can be either extracted or indirectly made available through a redundant library that is trained in advance. In a separate step, applying nonnegative tensor factorization, two algorithms are proposed in order to either extract or detect the basis matrices that best represent the power spectra of the source signals in the observed mixtures. The factorization is achieved by minimizing the $β$-divergence through multiplicative update rules. The sparsity of factorization can be controlled by tuning the value of $β$. Experiments show that sparsity, rather than the value assigned to $β$ in the training, is crucial in order to increase the separation performance. The proposed method was evaluated in several mixing conditions. It provides better separation quality with respect to other comparable algorithms.

Problem

Research questions and friction points this paper is trying to address.

Audio Source Separation

Reverberant Environments

Nonnegative Factorization

β-divergence

Multichannel Signal Processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

β-divergence

nonnegative tensor factorization

audio source separation