Neural Speech Separation with Parallel Amplitude and Phase Spectrum Estimation

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Most existing speech separation methods neglect explicit phase spectrum modeling, leading to incomplete time-frequency reconstruction and limited fidelity. To address this, we propose a novel neural separation model that jointly estimates magnitude and phase spectra in parallel within an end-to-end framework—the first to explicitly co-model both components. Our architecture integrates deep feature fusion with a time-frequency Transformer to capture long-range temporal and spectral dependencies, while a dual-branch parallel network separately optimizes magnitude and phase prediction. This design avoids error accumulation inherent in conventional implicit phase recovery or post-hoc phase estimation. Evaluated on standard benchmarks (WSJ0-2mix, Libri2Mix), our method significantly outperforms state-of-the-art time-domain and implicit-phase approaches, achieving higher SI-SNR improvement (SI-SNRi), enhanced speech intelligibility, and superior generalization and robustness.

Technology Category

Application Category

📝 Abstract

This paper proposes APSS, a novel neural speech separation model with parallel amplitude and phase spectrum estimation. Unlike most existing speech separation methods, the APSS distinguishes itself by explicitly estimating the phase spectrum for more complete and accurate separation. Specifically, APSS first extracts the amplitude and phase spectra from the mixed speech signal. Subsequently, the extracted amplitude and phase spectra are fused by a feature combiner into joint representations, which are then further processed by a deep processor with time-frequency Transformers to capture temporal and spectral dependencies. Finally, leveraging parallel amplitude and phase separators, the APSS estimates the respective spectra for each speaker from the resulting features, which are then combined via inverse short-time Fourier transform (iSTFT) to reconstruct the separated speech signals. Experimental results indicate that APSS surpasses both time-domain separation methods and implicit-phase-estimation-based time-frequency approaches. Also, APSS achieves stable and competitive results on multiple datasets, highlighting its strong generalization capability and practical applicability.

Problem

Research questions and friction points this paper is trying to address.

Explicitly estimates phase spectrum for accurate speech separation

Fuses amplitude and phase spectra into joint representations

Reconstructs separated speech signals via parallel spectrum estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel amplitude and phase spectrum estimation

Feature fusion with time-frequency Transformers processing

Inverse short-time Fourier transform signal reconstruction

🔎 Similar Papers

No similar papers found.