🤖 AI Summary
To address efficient speech enhancement under resource-constrained conditions, this paper proposes a lightweight generative adversarial network (GAN). Methodologically, it introduces a multi-scale depthwise separable convolution module to reduce computational overhead, incorporates a dual-normalization attention mechanism to enhance time-frequency modeling capability, and integrates residual refinement with dynamic pruning for further model compression. Evaluated on the VoiceBank+DEMAND dataset, the proposed method achieves a PESQ score of 3.45, significantly outperforming existing models of comparable parameter count. The primary contribution lies in the first joint integration of dual-normalization attention and dynamic pruning into a lightweight GAN framework—achieving substantial complexity reduction (42% fewer parameters and 38% lower FLOPs) without compromising speech quality. This enables practical deployment on edge devices.
📝 Abstract
We introduce EffiFusion-GAN (Efficient Fusion Generative Adversarial Network), a lightweight yet powerful model for speech enhancement. The model integrates depthwise separable convolutions within a multi-scale block to capture diverse acoustic features efficiently. An enhanced attention mechanism with dual normalization and residual refinement further improves training stability and convergence. Additionally, dynamic pruning is applied to reduce model size while maintaining performance, making the framework suitable for resource-constrained environments. Experimental evaluation on the public VoiceBank+DEMAND dataset shows that EffiFusion-GAN achieves a PESQ score of 3.45, outperforming existing models under the same parameter settings.