FedICT: Federated Multi-Task Distillation for Multi-Access Edge Computing

📅 2023-01-01
🏛️ IEEE Transactions on Parallel and Distributed Systems
📈 Citations: 27
Influential: 1
📄 PDF
🤖 AI Summary
To address the high communication overhead, neglect of device model heterogeneity, and unrealistic reliance on public datasets in federated multitask learning (FMTL) for mobile edge computing, this paper proposes the first public-data-free bidirectional federated multitask knowledge distillation framework. Our method introduces a local–global bidirectional distillation mechanism, integrating Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Alignment (LKA) modules. This design preserves client-specific modeling capabilities while enhancing global generalization consistency and mitigating client drift. Extensive experiments across three heterogeneous benchmark datasets demonstrate that, compared to baselines including FedAvg and FedGKT, our framework reduces communication cost to ≤1.2% of theirs, cuts training rounds by 75%, and achieves significant accuracy improvements across all tasks.
📝 Abstract
The growing interest in intelligent services and privacy protection for mobile devices has given rise to the widespread application of federated learning in Multi-access Edge Computing (MEC). Diverse user behaviors call for personalized services with heterogeneous Machine Learning (ML) models on different devices. Federated Multi-task Learning (FMTL) is proposed to train related but personalized ML models for different devices, whereas previous works suffer from excessive communication overhead during training and neglect the model heterogeneity among devices in MEC. Introducing knowledge distillation into FMTL can simultaneously enable efficient communication and model heterogeneity among clients, whereas existing methods rely on a public dataset, which is impractical in reality. To tackle this dilemma, Federated MultI-task Distillation for Multi-access Edge CompuTing (FedICT) is proposed. FedICT direct local-global knowledge aloof during bi-directional distillation processes between clients and the server, aiming to enable multi-task clients while alleviating client drift derived from divergent optimization directions of client-side local models. Specifically, FedICT includes Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Adjustment (LKA). FPKD is proposed to reinforce the clients’ fitting of local data by introducing prior knowledge of local data distributions. Moreover, LKA is proposed to correct the distillation loss of the server, making the transferred local knowledge better match the generalized representation. Extensive experiments on three datasets demonstrate that FedICT significantly outperforms all compared benchmarks in various data heterogeneous and model architecture settings, achieving improved accuracy with less than 1.2% training communication overhead compared with FedAvg and no more than 75% training communication round compared with FedGKT in all considered scenarios.
Problem

Research questions and friction points this paper is trying to address.

Federated learning addresses personalized services with heterogeneous models
Excessive communication overhead and model heterogeneity challenge existing methods
FedICT enables efficient multi-task distillation without public datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated multi-task distillation for edge computing
Bi-directional knowledge transfer without public datasets
Local knowledge adjustment to correct server distillation
🔎 Similar Papers
No similar papers found.
Z
Zhiyuan Wu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
S
Sheng Sun
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Y
Yuwei Wang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
M
Min Liu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; Zhongguancun Laboratory, Beijing, China
Q
Quyang Pan
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xuefeng Jiang
Xuefeng Jiang
Institute of Computing Technology, Chinese Academy of Sciences
Weakly-supervised LearningDistributed OptimizationAutonomous DrivingNoisy Label Learning
B
Bo Gao
School of Computer and Information Technology, and the Engineering Research Center of Network Management Technology for High-Speed Railway of Ministry of Education, Beijing Jiaotong University, Beijing, China