🤖 AI Summary
Real-world federated learning faces dual heterogeneity—non-IID data distributions and divergent client model architectures—particularly challenging in multimodal settings.
Method: We propose the first personalized federated learning (PFL) framework tailored for multimodal scenarios. Our approach introduces a task-similarity-aware personalized aggregation mechanism and a dimension-invariant knowledge sharing module, synergistically integrating multimodal representation learning with heterogeneous model knowledge distillation to enable privacy-preserving cross-task collaborative modeling.
Contribution/Results: We establish the first benchmark for PFL comprising 40 real-world multimodal tasks, systematically evaluating robustness against dual heterogeneity. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches: it simultaneously enhances client-specific performance and improves global generalization capability, validating its effectiveness in balancing personalization and federation.
📝 Abstract
Foundation models have shown remarkable capabilities across diverse multi-modal tasks, but their centralized training raises privacy concerns and induces high transmission costs. In contrast, federated learning (FL) offers a distributed alternative without the need to share data. Recently, for the growing demand for personalizing AI models for different user purposes, personalized federated learning (PFL) has emerged. PFL allows each client to leverage the knowledge of other clients for further adaptation to individual user preferences, again without the need to share data. Despite its potential, most PFL studies remain confined to simulated environments, overlooking the data and model heterogeneity that arise in real-world scenarios. In contrast, we first consider large data heterogeneity, evaluating on a new benchmark for multi-modal PFL, spanning 40 distinct tasks with realistic data distribution shifts. We then consider model heterogeneity in that we do not assume that all clients share similar model architectures. To address data heterogeneity, we propose a task-similarity-aware model aggregation method that provides customized global models to each client. For model heterogeneity, we propose a dimension-invariant module that enables knowledge sharing across heterogeneous models. Empirical validations demonstrate that the proposed approach outperforms the state-of-the-art, excelling in both personalization and generalization capabilities.