A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the insufficient modeling of target time-frequency (TF) units in single-channel real-time speech enhancement. We propose HDF-Net, a two-level hierarchical deep filtering framework. Methodologically, it introduces a novel time–frequency decoupled two-stage deep filtering mechanism; incorporates a lightweight TAConv module to enhance local TF feature extraction; and employs a hierarchical network architecture to jointly model target TF bins and their contextual neighborhoods. Compared with state-of-the-art methods, HDF-Net achieves significant improvements in DNSMOS, STOI, and PESQ scores, yielding superior speech quality and intelligibility. Moreover, it reduces model parameters by 32% and computational cost by 27%, achieving an optimal trade-off between performance and latency—making it well-suited for edge-device deployment in real-time applications.

Technology Category

Application Category

📝 Abstract

This paper proposes a model that integrates sub-band processing and deep filtering to fully exploit information from the target time-frequency (TF) bin and its surrounding TF bins for single-channel speech enhancement. The sub-band module captures surrounding frequency bin information at the input, while the deep filtering module applies filtering at the output to both the target TF bin and its surrounding TF bins. To further improve the model performance, we decouple deep filtering into temporal and frequency components and introduce a two-stage framework, reducing the complexity of filter coefficient prediction at each stage. Additionally, we propose the TAConv module to strengthen convolutional feature extraction. Experimental results demonstrate that the proposed hierarchical deep filtering network (HDF-Net) effectively utilizes surrounding TF bin information and outperforms other advanced systems while using fewer resources.

Problem

Research questions and friction points this paper is trying to address.

Enhances single-channel speech in real-time

Integrates sub-band and deep filtering techniques

Reduces complexity with two-stage framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage hierarchical deep filtering framework

Decouples filtering into temporal and frequency components

TAConv module enhances convolutional feature extraction

🔎 Similar Papers

No similar papers found.

Authors to Follow