๐ค AI Summary
Most existing speech separation methods neglect explicit phase spectrum modeling, leading to incomplete time-frequency reconstruction and limited fidelity. To address this, we propose a novel neural separation model that jointly estimates magnitude and phase spectra in parallel within an end-to-end frameworkโthe first to explicitly co-model both components. Our architecture integrates deep feature fusion with a time-frequency Transformer to capture long-range temporal and spectral dependencies, while a dual-branch parallel network separately optimizes magnitude and phase prediction. This design avoids error accumulation inherent in conventional implicit phase recovery or post-hoc phase estimation. Evaluated on standard benchmarks (WSJ0-2mix, Libri2Mix), our method significantly outperforms state-of-the-art time-domain and implicit-phase approaches, achieving higher SI-SNR improvement (SI-SNRi), enhanced speech intelligibility, and superior generalization and robustness.
๐ Abstract
This paper proposes APSS, a novel neural speech separation model with parallel amplitude and phase spectrum estimation. Unlike most existing speech separation methods, the APSS distinguishes itself by explicitly estimating the phase spectrum for more complete and accurate separation. Specifically, APSS first extracts the amplitude and phase spectra from the mixed speech signal. Subsequently, the extracted amplitude and phase spectra are fused by a feature combiner into joint representations, which are then further processed by a deep processor with time-frequency Transformers to capture temporal and spectral dependencies. Finally, leveraging parallel amplitude and phase separators, the APSS estimates the respective spectra for each speaker from the resulting features, which are then combined via inverse short-time Fourier transform (iSTFT) to reconstruct the separated speech signals. Experimental results indicate that APSS surpasses both time-domain separation methods and implicit-phase-estimation-based time-frequency approaches. Also, APSS achieves stable and competitive results on multiple datasets, highlighting its strong generalization capability and practical applicability.