FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Speech separation faces high memory consumption and latency in long-sequence modeling due to the quadratic complexity of standard Transformers. To address this, we propose Focused Linear Attention (FLA), a linear-complexity attention mechanism integrated into a novel Transformer architecture—FLA-SepReformer—which combines gated mechanisms with focused linear attention to preserve global modeling capability while achieving substantial computational efficiency gains. Our design unifies the block-wise modeling strategy of SepReformer and the local-global synergy principle of TF-Locaformer. Experiments on multiple benchmark datasets demonstrate state-of-the-art (SOTA) separation performance. Moreover, FLA-SepReformer achieves 1.49–2.29× faster inference speed and reduces GPU memory usage by up to 68.1% (with a minimum reduction to 31.9% of the original memory footprint), significantly advancing the practical deployment of long-duration speech separation systems.

Technology Category

Application Category

📝 Abstract
Speech separation always faces the challenge of handling prolonged time sequences. Past methods try to reduce sequence lengths and use the Transformer to capture global information. However, due to the quadratic time complexity of the attention module, memory usage and inference time still increase significantly with longer segments. To tackle this, we introduce Focused Linear Attention and build FLASepformer with linear complexity for efficient speech separation. Inspired by SepReformer and TF-Locoformer, we have two variants: FLA-SepReformer and FLA-TFLocoformer. We also add a new Gated module to improve performance further. Experimental results on various datasets show that FLASepformer matches state-of-the-art performance with less memory consumption and faster inference. FLA-SepReformer-T/B/L increases speed by 2.29x, 1.91x, and 1.49x, with 15.8%, 20.9%, and 31.9% GPU memory usage, proving our model's effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Efficient speech separation with linear complexity attention
Reducing memory usage and inference time in long sequences
Improving performance with gated modules and focused linear attention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Focused Linear Attention for linear complexity
Gated module to enhance separation performance
Two variants: FLA-SepReformer and FLA-TFLocoformer
🔎 Similar Papers
No similar papers found.