Elastic Mixture of Rank-Wise Experts for Knowledge Reuse in Federated Fine-Tuning

📅 2025-11-30

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Federated fine-tuning of large language models (LLMs) suffers from high computational and communication overhead, hindering deployment on resource-constrained devices. To address this, we propose SmartFed—a highly efficient federated fine-tuning framework. Its core innovations are: (1) reusing pretrained LoRA modules to avoid training from scratch; (2) introducing a rank-level Mixture-of-Rank-Experts (MoRE) mechanism for semantic-aware, fine-grained knowledge selection; and (3) designing an Elastic Expert Quota Allocation (EEQA) algorithm that dynamically optimizes parameter activation and computational load based on input semantics and device resource budgets. SmartFed integrates LoRA decomposition, rank-level expert activation, gating-based combination, and federated collaborative optimization. Extensive experiments across multiple benchmarks demonstrate that SmartFed significantly improves model performance and training efficiency—reducing communication volume by up to 42% and computational cost by 37%.

Technology Category

Application Category

📝 Abstract

Federated fine-tuning offers a promising solution for adapting Large Language Models (LLMs) to downstream tasks while safeguarding data privacy. However, its high computational and communication demands hinder its deployment on resource-constrained devices. In this paper, we propose SmartFed, a resource-efficient federated fine-tuning framework. SmartFed intelligently reuses knowledge embedded in existing LoRA modules, eliminating the need for expensive training from scratch when adapting LLMs to new tasks. To effectively exploit this knowledge and ensure scalability, we introduce the Mixture of Rank-Wise Experts (MoRE). MoRE decomposes LoRA modules into fine-grained rank-level experts. These experts are selectively activated and combined based on input semantics and resource budgets. Moreover, to optimize resource utilization, we present the Elastic Expert Quota Allocation (EEQA). EEQA adaptively allocates expert capacity across parameter matrices based on their contribution to model performance, focusing computing resources on the critical experts. Extensive evaluations across multiple benchmarks demonstrate that SmartFed significantly outperforms existing methods in model performance and training efficiency.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational and communication costs in federated fine-tuning

Reuses knowledge from existing LoRA modules for new tasks

Optimizes resource allocation by focusing on critical model components

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reuses existing LoRA modules for knowledge transfer

Decomposes modules into rank-level experts for selective activation

Adaptively allocates expert capacity based on performance contribution

🔎 Similar Papers

Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts