๐ค AI Summary
To address the challenges of long-sequence modeling, high computational complexity, and insufficient speech intelligibility in single-channel speech enhancement, this paper proposes xLSTM-SENetโthe first end-to-end speech enhancement model based on extended Long Short-Term Memory (xLSTM). It pioneers the application of xLSTM to speech enhancement by innovatively integrating exponential gating with bidirectional sequence modeling, coupled with a time-frequency domain mapping design, thereby substantially alleviating the computational bottlenecks of attention-based models on long utterances. Experiments on the VoiceBank+DEMAND dataset demonstrate that xLSTM-SENet2 significantly outperforms state-of-the-art Mamba and Conformer baselines, while exhibiting linear time complexity and strong scalability to long sequences. The core contributions are: (1) the first empirical validation of xLSTMโs effectiveness for speech enhancement; and (2) the identification of exponential gating and bidirectional modeling as critical factors driving performance gains.
๐ Abstract
While attention-based architectures, such as Conformers, excel in speech enhancement, they face challenges such as scalability with respect to input sequence length. In contrast, the recently proposed Extended Long Short-Term Memory (xLSTM) architecture offers linear scalability. However, xLSTM-based models remain unexplored for speech enhancement. This paper introduces xLSTM-SENet, the first xLSTM-based single-channel speech enhancement system. A comparative analysis reveals that xLSTM-and notably, even LSTM-can match or outperform state-of-the-art Mamba- and Conformer-based systems across various model sizes in speech enhancement on the VoiceBank+Demand dataset. Through ablation studies, we identify key architectural design choices such as exponential gating and bidirectionality contributing to its effectiveness. Our best xLSTM-based model, xLSTM-SENet2, outperforms state-of-the-art Mamba- and Conformer-based systems on the Voicebank+DEMAND dataset.