🤖 AI Summary
To address the dual challenges of model lightweighting and cross-device/environmental noise robustness for Acoustic Scene Classification (ASC) on edge devices, this paper proposes a teacher-ensemble-guided knowledge distillation framework. Methodologically, it introduces (1) a novel learnable dual-path teacher ensemble that dynamically fuses sample-level feature responses (z₁) and class-level logits (z₂), and (2) a quantization-ready student network incorporating global response normalization, depthwise-separable EDP modules, and a lightweight MLP fusion head. Evaluated on the TAU Urban Acoustic Scenes 2022 Mobile benchmark, the approach achieves state-of-the-art performance under stringent edge-deployment constraints—namely, low latency, compact model size (<1.5M parameters), and strong generalization. It significantly improves robustness to device heterogeneity and environmental acoustic noise while maintaining high accuracy.
📝 Abstract
We present a compact, quantization-ready acoustic scene classification (ASC) framework that couples an efficient student network with a learned teacher ensemble and knowledge distillation. The student backbone uses stacked depthwise-separable "expand-depthwise-project" blocks with global response normalization to stabilize training and improve robustness to device and noise variability, while a global pooling head yields class logits for efficient edge inference. To inject richer inductive bias, we assemble a diverse set of teacher models and learn two complementary fusion heads: z1, which predicts per-teacher mixture weights using a student-style backbone, and z2, a lightweight MLP that performs per-class logit fusion. The student is distilled from the ensemble via temperature-scaled soft targets combined with hard labels, enabling it to approximate the ensemble's decision geometry with a single compact model. Evaluated on the TAU Urban Acoustic Scenes 2022 Mobile benchmark, our approach achieves state-of-the-art (SOTA) results on the TAU dataset under matched edge-deployment constraints, demonstrating strong performance and practicality for mobile ASC.