FedMTFI: Feature Importance Based Optimized Multi Teacher Knowledge Distillation in Heterogeneous Federated Learning Environment

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the performance degradation in heterogeneous federated learning caused by disparities in client computational capabilities and non-independent and identically distributed (non-IID) data. To mitigate these challenges, the authors propose a novel approach that integrates client clustering with multi-teacher knowledge distillation. Clients are first clustered based on hardware capacity and model architecture; within each cluster, local models are trained via FedAvg and aggregated into distinct teacher prototypes. The method innovatively incorporates SHAP values into the multi-teacher distillation process, leveraging feature importance to weight teacher contributions during the training of a global student model. This framework not only preserves model interpretability but also achieves significantly higher accuracy than existing federated learning algorithms under non-IID settings.

📝 Abstract

Federated learning (FL) is a decentralized approach that enables collaborative model training without exposing raw data. Instead of transferring sensitive data, it allows devices to share only model weights, keeping personal data locally and secure. However, in real world settings, the data held by devices is often not evenly distributed and devices mostly differ in computing power and memory capacity. These differences make FL harder to maintain consistent performance across the system. To address these issues, we propose FedMTFI, a novel architecture that combines multi-teacher knowledge distillation (MTKD) with feature importance to improve the FL process in heterogeneous environments. In FedMTFI, clients are clustered based on similar hardware and model types. Each cluster trains a specific model on not independently and identically distributed (non-IID) data. Within a cluster, every client updates that model using only its own local private data. The server then aggregates the locally trained models in each cluster using FedAvg to form multiple prototype models. Then these prototypes serve as teacher models to train a global generalized student model using MTKD. What makes FedMTFI more unique is the integration of Shapley values (SHAP) to emphasize important features during distillation, which enhances both accuracy and interpretability. Experimental results show that FedMTFI achieves higher accuracy than traditional FL algorithms and performs more effectively under non-IID data conditions.

Problem

Research questions and friction points this paper is trying to address.

Federated Learning

Heterogeneous Environment

Non-IID Data

Model Heterogeneity

Data Distribution Skew

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

Multi-Teacher Knowledge Distillation

Feature Importance