Federated CLIP for Resource-Efficient Heterogeneous Medical Image Classification

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address data privacy concerns, cross-institutional data heterogeneity, and high computational overhead in federated deployment of vision-language models (VLMs) for medical image classification, this paper proposes a lightweight and efficient federated vision-language framework. Methodologically, it freezes the pre-trained CLIP visual-linguistic encoder to preserve transferable knowledge; introduces a Masked Feature Adaptation Module (FAM) and private MLP classifiers for local personalization; employs adaptive KL-divergence-based distillation regularization to align client-specific representations; and integrates model compression with ensemble prediction to drastically reduce communication and computational costs. Evaluated on four public medical imaging datasets, the method achieves an 8% accuracy improvement over the best-performing baseline on ISIC2019 and trains 120× faster than FedAvg. It strikes an optimal trade-off among privacy preservation, generalization capability, and resource efficiency.

Technology Category

Application Category

📝 Abstract

Despite the remarkable performance of deep models in medical imaging, they still require source data for training, which limits their potential in light of privacy concerns. Federated learning (FL), as a decentralized learning framework that trains a shared model with multiple hospitals (a.k.a., FL clients), provides a feasible solution. However, data heterogeneity and resource costs hinder the deployment of FL models, especially when using vision language models (VLM). To address these challenges, we propose a novel contrastive language-image pre-training (CLIP) based FL approach for medical image classification (FedMedCLIP). Specifically, we introduce a masked feature adaptation module (FAM) as a communication module to reduce the communication load while freezing the CLIP encoders to reduce the computational overhead. Furthermore, we propose a masked multi-layer perceptron (MLP) as a private local classifier to adapt to the client tasks. Moreover, we design an adaptive Kullback-Leibler (KL) divergence-based distillation regularization method to enable mutual learning between FAM and MLP. Finally, we incorporate model compression to transmit the FAM parameters while using ensemble predictions for classification. Extensive experiments on four publicly available medical datasets demonstrate that our model provides feasible performance (e.g., 8% higher compared to second best baseline on ISIC2019) with reasonable resource cost (e.g., 120$ imes$ faster than FedAVG).

Problem

Research questions and friction points this paper is trying to address.

Addressing data heterogeneity in federated medical image classification

Reducing computational and communication costs in FL frameworks

Enhancing privacy-preserving model performance with CLIP adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Masked feature adaptation reduces communication load

Frozen CLIP encoders minimize computational overhead

Adaptive KL divergence enables mutual model learning

🔎 Similar Papers

No similar papers found.