Ensemble-Guided Distillation for Compact and Robust Acoustic Scene Classification on Edge Devices

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of model lightweighting and cross-device/environmental noise robustness for Acoustic Scene Classification (ASC) on edge devices, this paper proposes a teacher-ensemble-guided knowledge distillation framework. Methodologically, it introduces (1) a novel learnable dual-path teacher ensemble that dynamically fuses sample-level feature responses (z₁) and class-level logits (z₂), and (2) a quantization-ready student network incorporating global response normalization, depthwise-separable EDP modules, and a lightweight MLP fusion head. Evaluated on the TAU Urban Acoustic Scenes 2022 Mobile benchmark, the approach achieves state-of-the-art performance under stringent edge-deployment constraints—namely, low latency, compact model size (<1.5M parameters), and strong generalization. It significantly improves robustness to device heterogeneity and environmental acoustic noise while maintaining high accuracy.

Technology Category

Application Category

📝 Abstract
We present a compact, quantization-ready acoustic scene classification (ASC) framework that couples an efficient student network with a learned teacher ensemble and knowledge distillation. The student backbone uses stacked depthwise-separable "expand-depthwise-project" blocks with global response normalization to stabilize training and improve robustness to device and noise variability, while a global pooling head yields class logits for efficient edge inference. To inject richer inductive bias, we assemble a diverse set of teacher models and learn two complementary fusion heads: z1, which predicts per-teacher mixture weights using a student-style backbone, and z2, a lightweight MLP that performs per-class logit fusion. The student is distilled from the ensemble via temperature-scaled soft targets combined with hard labels, enabling it to approximate the ensemble's decision geometry with a single compact model. Evaluated on the TAU Urban Acoustic Scenes 2022 Mobile benchmark, our approach achieves state-of-the-art (SOTA) results on the TAU dataset under matched edge-deployment constraints, demonstrating strong performance and practicality for mobile ASC.
Problem

Research questions and friction points this paper is trying to address.

Develops compact acoustic scene classification for edge devices
Enhances robustness to device and noise variability
Distills ensemble knowledge into a single efficient model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble-guided distillation for compact edge ASC
Depthwise-separable blocks with global response normalization
Dual teacher fusion heads for robust knowledge transfer
🔎 Similar Papers
No similar papers found.
H
Hossein Sharify
Electrical Engineering Department, Sharif University of Technology, Tehran, Iran
B
Behnam Raoufi
Electrical Engineering Department, Sharif University of Technology, Tehran, Iran
M
Mahdy Ramezani
Electrical Engineering Department, Sharif University of Technology, Tehran, Iran
K
Khosrow Hajsadeghi
Electrical Engineering Department, Sharif University of Technology, Tehran, Iran
Saeed Bagheri Shouraki
Saeed Bagheri Shouraki
Professor of Electrical Engineering, Sharif University
FuzzyANNControlRoboticsAI