🤖 AI Summary
To address the high communication overhead, neglect of device model heterogeneity, and unrealistic reliance on public datasets in federated multitask learning (FMTL) for mobile edge computing, this paper proposes the first public-data-free bidirectional federated multitask knowledge distillation framework. Our method introduces a local–global bidirectional distillation mechanism, integrating Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Alignment (LKA) modules. This design preserves client-specific modeling capabilities while enhancing global generalization consistency and mitigating client drift. Extensive experiments across three heterogeneous benchmark datasets demonstrate that, compared to baselines including FedAvg and FedGKT, our framework reduces communication cost to ≤1.2% of theirs, cuts training rounds by 75%, and achieves significant accuracy improvements across all tasks.
📝 Abstract
The growing interest in intelligent services and privacy protection for mobile devices has given rise to the widespread application of federated learning in Multi-access Edge Computing (MEC). Diverse user behaviors call for personalized services with heterogeneous Machine Learning (ML) models on different devices. Federated Multi-task Learning (FMTL) is proposed to train related but personalized ML models for different devices, whereas previous works suffer from excessive communication overhead during training and neglect the model heterogeneity among devices in MEC. Introducing knowledge distillation into FMTL can simultaneously enable efficient communication and model heterogeneity among clients, whereas existing methods rely on a public dataset, which is impractical in reality. To tackle this dilemma, Federated MultI-task Distillation for Multi-access Edge CompuTing (FedICT) is proposed. FedICT direct local-global knowledge aloof during bi-directional distillation processes between clients and the server, aiming to enable multi-task clients while alleviating client drift derived from divergent optimization directions of client-side local models. Specifically, FedICT includes Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Adjustment (LKA). FPKD is proposed to reinforce the clients’ fitting of local data by introducing prior knowledge of local data distributions. Moreover, LKA is proposed to correct the distillation loss of the server, making the transferred local knowledge better match the generalized representation. Extensive experiments on three datasets demonstrate that FedICT significantly outperforms all compared benchmarks in various data heterogeneous and model architecture settings, achieving improved accuracy with less than 1.2% training communication overhead compared with FedAvg and no more than 75% training communication round compared with FedGKT in all considered scenarios.