A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge

📅 2025-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Small models deployed on edge devices suffer from poor generalization and typically rely on large pretrained teacher networks for knowledge distillation. Method: This paper proposes Hierarchical Self-Supervised Knowledge Distillation (LSSKD), a teacher-free framework that embeds lightweight, removable auxiliary classifiers at intermediate feature layers to enable stage-wise self-supervised knowledge transfer without requiring a pretrained teacher. Contribution/Results: LSSKD introduces the first integrated distillation paradigm—teacher-free, stage-matched, and auxiliary-structure-removable—achieving strong few-shot generalization while incurring zero inference overhead. Experiments demonstrate that LSSKD improves accuracy by 4.54% on average over PS-KD on CIFAR-100, outperforms HASSKD by 0.32% on ImageNet, and achieves state-of-the-art performance on few-shot Tiny ImageNet—all without additional computational cost during inference.

Technology Category

Application Category

📝 Abstract
We introduce Layered Self-Supervised Knowledge Distillation (LSSKD) framework for training compact deep learning models. Unlike traditional methods that rely on pre-trained teacher networks, our approach appends auxiliary classifiers to intermediate feature maps, generating diverse self-supervised knowledge and enabling one-to-one transfer across different network stages. Our method achieves an average improvement of 4.54% over the state-of-the-art PS-KD method and a 1.14% gain over SSKD on CIFAR-100, with a 0.32% improvement on ImageNet compared to HASSKD. Experiments on Tiny ImageNet and CIFAR-100 under few-shot learning scenarios also achieve state-of-the-art results. These findings demonstrate the effectiveness of our approach in enhancing model generalization and performance without the need for large over-parameterized teacher networks. Importantly, at the inference stage, all auxiliary classifiers can be removed, yielding no extra computational cost. This makes our model suitable for deploying small language models on affordable low-computing devices. Owing to its lightweight design and adaptability, our framework is particularly suitable for multimodal sensing and cyber-physical environments that require efficient and responsive inference. LSSKD facilitates the development of intelligent agents capable of learning from limited sensory data under weak supervision.
Problem

Research questions and friction points this paper is trying to address.

Efficient multimodal learning on edge devices
Self-supervised knowledge distillation without large teacher networks
Compact model deployment on low-computing devices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses auxiliary classifiers for self-supervised knowledge
Enables one-to-one transfer across network stages
Removes auxiliary classifiers at inference stage
🔎 Similar Papers
No similar papers found.
T
Tarique Dahri
Fast School of Computing, National University of Computer and Emerging Sciences, Karachi, Pakistan
Z
Z. Memon
Fast School of Computing, National University of Computer and Emerging Sciences, Karachi, Pakistan
Z
Zhenyu Yu
Universiti Malaya, 50603 Kuala Lumpur, Malaysia
Mohd. Yamani Idna Idris
Mohd. Yamani Idna Idris
University of Malaya
Image and Signal ProcessingFPGA designEmbedded SystemInternet of ThingsInformation Security
Sheheryar Khan
Sheheryar Khan
School of Professional Education and Executive Development, The Hong Kong Polytechnic University, Hong Kong
Sadiq Ahmad
Sadiq Ahmad
Assistant Professor, COMSATS University Islamabad, Wah Cantt
Smart Grid TechnologyResource Management in Power SystemEnergy Management and TradingBlockchain TechnologyAI and Machin
M
Maged Shoman
Intelligent Transportation Systems University of Tennessee-Oak Ridge Innovation Institute’s Energy Storage and Transportation Convergent Research Initiative
S
Saddam Aziz
Independent Researcher, USA
Rizwan Qureshi
Rizwan Qureshi
Center for Research in Computer Vision (CRCV), University of Central Florida, Orlando, USA
Cancer Data ScienceResponsible AIComputer VisionBioinformaticsMachine Learning