🤖 AI Summary
To address data privacy concerns, cross-institutional data heterogeneity, and high computational overhead in federated deployment of vision-language models (VLMs) for medical image classification, this paper proposes a lightweight and efficient federated vision-language framework. Methodologically, it freezes the pre-trained CLIP visual-linguistic encoder to preserve transferable knowledge; introduces a Masked Feature Adaptation Module (FAM) and private MLP classifiers for local personalization; employs adaptive KL-divergence-based distillation regularization to align client-specific representations; and integrates model compression with ensemble prediction to drastically reduce communication and computational costs. Evaluated on four public medical imaging datasets, the method achieves an 8% accuracy improvement over the best-performing baseline on ISIC2019 and trains 120× faster than FedAvg. It strikes an optimal trade-off among privacy preservation, generalization capability, and resource efficiency.
📝 Abstract
Despite the remarkable performance of deep models in medical imaging, they still require source data for training, which limits their potential in light of privacy concerns. Federated learning (FL), as a decentralized learning framework that trains a shared model with multiple hospitals (a.k.a., FL clients), provides a feasible solution. However, data heterogeneity and resource costs hinder the deployment of FL models, especially when using vision language models (VLM). To address these challenges, we propose a novel contrastive language-image pre-training (CLIP) based FL approach for medical image classification (FedMedCLIP). Specifically, we introduce a masked feature adaptation module (FAM) as a communication module to reduce the communication load while freezing the CLIP encoders to reduce the computational overhead. Furthermore, we propose a masked multi-layer perceptron (MLP) as a private local classifier to adapt to the client tasks. Moreover, we design an adaptive Kullback-Leibler (KL) divergence-based distillation regularization method to enable mutual learning between FAM and MLP. Finally, we incorporate model compression to transmit the FAM parameters while using ensemble predictions for classification. Extensive experiments on four publicly available medical datasets demonstrate that our model provides feasible performance (e.g., 8% higher compared to second best baseline on ISIC2019) with reasonable resource cost (e.g., 120$ imes$ faster than FedAVG).