Adaptive Noise Resilient Keyword Spotting Using One-Shot Learning

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the significant degradation in keyword spotting (KWS) performance on resource-constrained embedded devices under dynamic acoustic noise, this paper proposes the first single-shot learning framework for noise-adaptive KWS. Methodologically, it leverages a lightweight pre-trained model and integrates noise-aware gradient regularization with a one-step fine-tuning strategy, enabling zero-shot incremental adaptation using only a single noisy utterance and one forward-backward pass. The approach achieves ultra-low latency (<10 ms), minimal memory overhead (<50 KB additional parameters), and strong robustness. Extensive experiments on real-world noise conditions (SNR ranging from 24 dB to −3 dB) demonstrate accuracy improvements of 4.9%–46.0% over baselines, with particularly pronounced gains under low-SNR regimes (≤18 dB). These results validate its suitability for real-time, on-device deployment in noisy edge environments.

Technology Category

Application Category

📝 Abstract
Keyword spotting (KWS) is a key component of smart devices, enabling efficient and intuitive audio interaction. However, standard KWS systems deployed on embedded devices often suffer performance degradation under real-world operating conditions. Resilient KWS systems address this issue by enabling dynamic adaptation, with applications such as adding or replacing keywords, adjusting to specific users, and improving noise robustness. However, deploying resilient, standalone KWS systems with low latency on resource-constrained devices remains challenging due to limited memory and computational resources. This study proposes a low computational approach for continuous noise adaptation of pretrained neural networks used for KWS classification, requiring only 1-shot learning and one epoch. The proposed method was assessed using two pretrained models and three real-world noise sources at signal-to-noise ratios (SNRs) ranging from 24 to -3 dB. The adapted models consistently outperformed the pretrained models across all scenarios, especially at SNR $leq$ 18 dB, achieving accuracy improvements of 4.9% to 46.0%. These results highlight the efficacy of the proposed methodology while being lightweight enough for deployment on resource-constrained devices.
Problem

Research questions and friction points this paper is trying to address.

Improving keyword spotting robustness in noisy environments
Enabling adaptive KWS with low computational resources
Achieving high accuracy with one-shot learning adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-shot learning for noise adaptation
Low computational continuous adaptation
Lightweight deployment on resource-constrained devices
🔎 Similar Papers
No similar papers found.
L
Luciano Sebastian Martinez-Rau
Department of Computer and Electrical Engineering, Mid Sweden University, Sundsvall, Sweden
Q
Quynh Nguyen Phuong Vu
Department of Computer and Electrical Engineering, Mid Sweden University, Sundsvall, Sweden
Y
Yuxuan Zhang
Department of Computer and Electrical Engineering, Mid Sweden University, Sundsvall, Sweden
Bengt Oelmann
Bengt Oelmann
Professor in Electronics
Embedded systemsindustrial sensorsmechatronicslow power electronics
S
Sebastian Bader
Department of Computer and Electrical Engineering, Mid Sweden University, Sundsvall, Sweden