🤖 AI Summary
Keyword spotting (KWS) for 8 kHz narrowband audio under non-IID conditions poses unique challenges—distinct from mainstream KWS setups—due to bandwidth limitations, statistical data distribution shifts across clients, and stringent edge-device resource constraints.
Method: We propose a lightweight cascaded multi-instance learning (MIL) framework tailored for edge deployment. It is the first to integrate MIL with a cascaded deep neural network (DNN), incorporating an early-exit mechanism to mitigate class imbalance and reduce computational overhead. Robustness is enhanced by fusing Mel spectrograms, MFCCs, and periodicity features.
Results: Under strict deployment constraints, our system achieves a 6% false rejection rate (FRR) at 0.75 false alarms per hour (FAR), significantly outperforming existing narrowband KWS approaches while maintaining high accuracy and ultra-low power consumption.
📝 Abstract
We propose using cascaded classifiers for a keyword spotting (KWS) task on narrow-band (NB), 8kHz audio acquired in non-IID environments --- a more challenging task than most state-of-the-art KWS systems face. We present a model that incorporates Deep Neural Networks (DNNs), cascading, multiple-feature representations, and multiple-instance learning. The cascaded classifiers handle the task's class imbalance and reduce power consumption on computationally-constrained devices via early termination. The KWS system achieves a false negative rate of 6% at an hourly false positive rate of 0.75