Ensemble-Guided Distillation for Compact and Robust Acoustic Scene Classification on Edge Devices

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address the dual challenges of model lightweighting and cross-device/environmental noise robustness for Acoustic Scene Classification (ASC) on edge devices, this paper proposes a teacher-ensemble-guided knowledge distillation framework. Methodologically, it introduces (1) a novel learnable dual-path teacher ensemble that dynamically fuses sample-level feature responses (z₁) and class-level logits (z₂), and (2) a quantization-ready student network incorporating global response normalization, depthwise-separable EDP modules, and a lightweight MLP fusion head. Evaluated on the TAU Urban Acoustic Scenes 2022 Mobile benchmark, the approach achieves state-of-the-art performance under stringent edge-deployment constraints—namely, low latency, compact model size (<1.5M parameters), and strong generalization. It significantly improves robustness to device heterogeneity and environmental acoustic noise while maintaining high accuracy.

Technology Category

Application Category

📝 Abstract

We present a compact, quantization-ready acoustic scene classification (ASC) framework that couples an efficient student network with a learned teacher ensemble and knowledge distillation. The student backbone uses stacked depthwise-separable "expand-depthwise-project" blocks with global response normalization to stabilize training and improve robustness to device and noise variability, while a global pooling head yields class logits for efficient edge inference. To inject richer inductive bias, we assemble a diverse set of teacher models and learn two complementary fusion heads: z1, which predicts per-teacher mixture weights using a student-style backbone, and z2, a lightweight MLP that performs per-class logit fusion. The student is distilled from the ensemble via temperature-scaled soft targets combined with hard labels, enabling it to approximate the ensemble's decision geometry with a single compact model. Evaluated on the TAU Urban Acoustic Scenes 2022 Mobile benchmark, our approach achieves state-of-the-art (SOTA) results on the TAU dataset under matched edge-deployment constraints, demonstrating strong performance and practicality for mobile ASC.

Problem

Research questions and friction points this paper is trying to address.

Develops compact acoustic scene classification for edge devices

Enhances robustness to device and noise variability

Distills ensemble knowledge into a single efficient model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble-guided distillation for compact edge ASC

Depthwise-separable blocks with global response normalization

Dual teacher fusion heads for robust knowledge transfer

🔎 Similar Papers

Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation