🤖 AI Summary
To address the insufficient noise adaptability of hearing aids in resource-constrained scenarios, this paper proposes DFingerNet: a lightweight, context-aware online speech enhancement model. Methodologically, it introduces, for the first time, an external noise recording–driven contextual modeling mechanism into the DeepFilterNet architecture, integrating noise-conditioned modeling with lightweight feature distillation to achieve environment-specific denoising without increasing inference overhead. The key contribution lies in overcoming the generalization bottleneck of single-model approaches, enabling real-time adaptive inference on low-compute devices. Evaluated on the DNS Challenge benchmark, DFingerNet achieves a 1.23 dB PESQ improvement and a 3.8 percentage-point STOI gain over the original DeepFilterNet, while maintaining real-time performance and strong robustness.
📝 Abstract
The extbf{DeepFilterNet} ( extbf{DFN}) architecture was recently proposed as a deep learning model suited for hearing aid devices. Despite its competitive performance on numerous benchmarks, it still follows a `one-size-fits-all' approach, which aims to train a single, monolithic architecture that generalises across different noises and environments. However, its limited size and computation budget can hamper its generalisability. Recent work has shown that in-context adaptation can improve performance by conditioning the denoising process on additional information extracted from background recordings to mitigate this. These recordings can be offloaded outside the hearing aid, thus improving performance while adding minimal computational overhead. We introduce these principles to the extbf{DFN} model, thus proposing the extbf{DFingerNet} ( extbf{DFiN}) model, which shows superior performance on various benchmarks inspired by the DNS Challenge.