FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the challenge of federated deployment of multimodal large language models (MLLMs) under distributed multimodal data and stringent privacy constraints, this paper proposes the first “large model–not-on-device” lightweight federated fine-tuning framework. The full MLLM resides exclusively on the server, while clients deploy only a modality-aware NanoEdge module—comprising a low-rank NanoAdapter, modality-specific encoders, and cross-modal connectors. This design eliminates client-side LLM deployment and ensures zero model parameter upload. The NanoAdapter reduces communication overhead to just 0.01% of the original parameter count and achieves 95% client storage compression. Additionally, the framework incorporates federated compression and heterogeneous data adaptation mechanisms. Extensive experiments demonstrate significant improvements over existing federated baselines on multimodal reasoning and cross-modal retrieval tasks, marking the first practical federated training framework scalable to billion-parameter MLLMs.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) excel in tasks like multimodal reasoning and cross-modal retrieval but face deployment challenges in real-world scenarios due to distributed multimodal data and strict privacy requirements. Federated Learning (FL) offers a solution by enabling collaborative model training without centralizing data. However, realizing FL for MLLMs presents significant challenges, including high computational demands, limited client capacity, substantial communication costs, and heterogeneous client data. Existing FL methods assume client-side deployment of full models, an assumption that breaks down for large-scale MLLMs due to their massive size and communication demands. To address these limitations, we propose FedNano, the first FL framework that centralizes the LLM on the server while introducing NanoEdge, a lightweight module for client-specific adaptation. NanoEdge employs modality-specific encoders, connectors, and trainable NanoAdapters with low-rank adaptation. This design eliminates the need to deploy LLM on clients, reducing client-side storage by 95%, and limiting communication overhead to only 0.01% of the model parameters. By transmitting only compact NanoAdapter updates, FedNano handles heterogeneous client data and resource constraints while preserving privacy. Experiments demonstrate that FedNano outperforms prior FL baselines, bridging the gap between MLLM scale and FL feasibility, and enabling scalable, decentralized multimodal AI systems.

Problem

Research questions and friction points this paper is trying to address.

Addresses deployment challenges of MLLMs in privacy-sensitive distributed data scenarios

Reduces computational and communication costs in federated learning for large MLLMs

Enables lightweight client adaptation without full model deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Centralizes LLM on server with NanoEdge

Uses NanoAdapters for client adaptation

Reduces communication overhead significantly

🔎 Similar Papers

Federated Large Language Models: Current Progress and Future Directions