FedVLM: Scalable Personalized Vision-Language Models through Federated Learning

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

To address the challenges of fine-tuning vision-language models (VLMs) and poor generalization under non-i.i.d. data in federated learning, this paper proposes personalized LoRA (pLoRA). pLoRA preserves the global LoRA’s low-rank structure while enabling each client to dynamically learn client-specific adaptation parameters, thereby achieving privacy-preserving, decentralized personalized fine-tuning. It balances global consistency and local heterogeneity through lightweight local adaptation and a controllable aggregation mechanism—without increasing communication overhead. Experiments on the RLAIF-V dataset demonstrate that pLoRA improves client-specific task performance by 24.5% over standard LoRA, significantly enhancing model adaptability and generalization in non-i.i.d. settings. This work establishes an efficient, scalable, parameter-efficient fine-tuning paradigm for federated multimodal learning.

Technology Category

Application Category

📝 Abstract

Vision-language models (VLMs) demonstrate impressive zero-shot and few-shot learning capabilities, making them essential for several downstream tasks. However, fine-tuning these models at scale remains challenging, particularly in federated environments where data is decentralized and non-iid across clients. Existing parameter-efficient tuning methods like LoRA (Low-Rank Adaptation) reduce computational overhead but struggle with heterogeneous client data, leading to suboptimal generalization. To address these challenges, we propose FedVLM, a federated LoRA fine-tuning framework that enables decentralized adaptation of VLMs while preserving model privacy and reducing reliance on centralized training. To further tackle data heterogeneity, we introduce personalized LoRA (pLoRA), which dynamically adapts LoRA parameters to each client's unique data distribution, significantly improving local adaptation while maintaining global model aggregation. Experiments on the RLAIF-V dataset show that pLoRA improves client-specific performance by 24.5% over standard LoRA, demonstrating superior adaptation in non-iid settings. FedVLM provides a scalable and efficient solution for fine-tuning VLMs in federated settings, advancing personalized adaptation in distributed learning scenarios.

Problem

Research questions and friction points this paper is trying to address.

Scalable fine-tuning of VLMs in federated learning environments

Handling non-iid data heterogeneity across decentralized clients

Improving personalized adaptation while maintaining model privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated LoRA fine-tuning for decentralized VLM adaptation

Personalized LoRA dynamically adjusts to client data

Scalable solution for federated VLM fine-tuning

🔎 Similar Papers

Federated Large Language Models: Current Progress and Future Directions