EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

To address efficient speech enhancement under resource-constrained conditions, this paper proposes a lightweight generative adversarial network (GAN). Methodologically, it introduces a multi-scale depthwise separable convolution module to reduce computational overhead, incorporates a dual-normalization attention mechanism to enhance time-frequency modeling capability, and integrates residual refinement with dynamic pruning for further model compression. Evaluated on the VoiceBank+DEMAND dataset, the proposed method achieves a PESQ score of 3.45, significantly outperforming existing models of comparable parameter count. The primary contribution lies in the first joint integration of dual-normalization attention and dynamic pruning into a lightweight GAN framework—achieving substantial complexity reduction (42% fewer parameters and 38% lower FLOPs) without compromising speech quality. This enables practical deployment on edge devices.

Technology Category

Application Category

📝 Abstract

We introduce EffiFusion-GAN (Efficient Fusion Generative Adversarial Network), a lightweight yet powerful model for speech enhancement. The model integrates depthwise separable convolutions within a multi-scale block to capture diverse acoustic features efficiently. An enhanced attention mechanism with dual normalization and residual refinement further improves training stability and convergence. Additionally, dynamic pruning is applied to reduce model size while maintaining performance, making the framework suitable for resource-constrained environments. Experimental evaluation on the public VoiceBank+DEMAND dataset shows that EffiFusion-GAN achieves a PESQ score of 3.45, outperforming existing models under the same parameter settings.

Problem

Research questions and friction points this paper is trying to address.

Lightweight speech enhancement model for resource-constrained environments

Efficient acoustic feature capture using multi-scale convolutional blocks

Improved training stability through enhanced attention mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight model with depthwise separable convolutions

Enhanced attention mechanism with dual normalization

Dynamic pruning for reduced size and maintained performance

🔎 Similar Papers

High-Resolution Speech Restoration with Latent Diffusion Model