Improving Speech Enhancement by Cross- and Sub-band Processing with State Space Model

📅 2025-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of modeling inter-subband feature disparities and preserving low-energy high-frequency components in speech enhancement using state space models (SSMs), this paper proposes CSMamba—a cross-band and subband-cooperative framework. The method introduces three key innovations: (1) a bandwidth-adaptive subband splitting module that partitions the spectrum into four non-uniform frequency bands; (2) a spectral restoration module that leverages band similarity-driven weight assignment and multi-perspective cross-band fusion to enhance structured cross-band modeling; and (3) a lightweight SSM architecture built upon an improved Mamba backbone. Evaluated on DNS Challenge 2021, CSMamba achieves superior performance over multiple state-of-the-art methods in PESQ, STOI, and SI-SNR, while significantly reducing parameter count.

Technology Category

Application Category

📝 Abstract
Recently, the state space model (SSM) represented by Mamba has shown remarkable performance in long-term sequence modeling tasks, including speech enhancement. However, due to substantial differences in sub-band features, applying the same SSM to all sub-bands limits its inference capability. Additionally, when processing each time frame of the time-frequency representation, the SSM may forget certain high-frequency information of low energy, making the restoration of structure in the high-frequency bands challenging. For this reason, we propose Cross- and Sub-band Mamba (CSMamba). To assist the SSM in handling different sub-band features flexibly, we propose a band split block that splits the full-band into four sub-bands with different widths based on their information similarity. We then allocate independent weights to each sub-band, thereby reducing the inference burden on the SSM. Furthermore, to mitigate the forgetting of low-energy information in the high-frequency bands by the SSM, we introduce a spectrum restoration block that enhances the representation of the cross-band features from multiple perspectives. Experimental results on the DNS Challenge 2021 dataset demonstrate that CSMamba outperforms several state-of-the-art (SOTA) speech enhancement methods in three objective evaluation metrics with fewer parameters.
Problem

Research questions and friction points this paper is trying to address.

Enhance speech quality
Handle sub-band differences
Restore high-frequency information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-band Mamba processing
Band split for sub-band features
Spectrum restoration for high-frequencies
🔎 Similar Papers
No similar papers found.
J
Jizhen Li
NERCMS, School of Computer Science, Hubei Luojia Laboratory, Wuhan University, China
Weiping Tu
Weiping Tu
Wuhan University, Wuhan City, Hubei Prov., China
audio signal processingartificial intelligence
Y
Yuhong Yang
NERCMS, School of Computer Science, Hubei Luojia Laboratory, Wuhan University, China; Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, China
X
Xinmeng Xu
NERCMS, School of Computer Science, Hubei Luojia Laboratory, Wuhan University, China
Y
Yiqun Zhang
NERCMS, School of Computer Science, Hubei Luojia Laboratory, Wuhan University, China
Y
Yanzhen Ren
School of Cyber Science and Engineering, Wuhan University, China