A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the insufficient modeling of target time-frequency (TF) units in single-channel real-time speech enhancement. We propose HDF-Net, a two-level hierarchical deep filtering framework. Methodologically, it introduces a novel time–frequency decoupled two-stage deep filtering mechanism; incorporates a lightweight TAConv module to enhance local TF feature extraction; and employs a hierarchical network architecture to jointly model target TF bins and their contextual neighborhoods. Compared with state-of-the-art methods, HDF-Net achieves significant improvements in DNSMOS, STOI, and PESQ scores, yielding superior speech quality and intelligibility. Moreover, it reduces model parameters by 32% and computational cost by 27%, achieving an optimal trade-off between performance and latency—making it well-suited for edge-device deployment in real-time applications.

Technology Category

Application Category

📝 Abstract
This paper proposes a model that integrates sub-band processing and deep filtering to fully exploit information from the target time-frequency (TF) bin and its surrounding TF bins for single-channel speech enhancement. The sub-band module captures surrounding frequency bin information at the input, while the deep filtering module applies filtering at the output to both the target TF bin and its surrounding TF bins. To further improve the model performance, we decouple deep filtering into temporal and frequency components and introduce a two-stage framework, reducing the complexity of filter coefficient prediction at each stage. Additionally, we propose the TAConv module to strengthen convolutional feature extraction. Experimental results demonstrate that the proposed hierarchical deep filtering network (HDF-Net) effectively utilizes surrounding TF bin information and outperforms other advanced systems while using fewer resources.
Problem

Research questions and friction points this paper is trying to address.

Enhances single-channel speech in real-time
Integrates sub-band and deep filtering techniques
Reduces complexity with two-stage framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage hierarchical deep filtering framework
Decouples filtering into temporal and frequency components
TAConv module enhances convolutional feature extraction
🔎 Similar Papers
No similar papers found.
Shenghui Lu
Shenghui Lu
Xiamen University
Speech enhancementSpeech recognition
H
Hukai Huang
School of Informatics, Xiamen University, China
J
Jinanglong Yao
School of Informatics, Xiamen University, China
K
Kaidi Wang
School of Informatics, Xiamen University, China
Q
Q. Hong
School of Informatics, Xiamen University, China
L
Lin Li
School of Electronic Science and Engineering, Xiamen University, China