FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing PEFT methods (e.g., LoRA) uniformly deploy adapters across all layers of LLMs, ignoring inter-layer contribution heterogeneity and task-specific rank requirements—leading to parameter redundancy and suboptimal efficiency. This paper proposes Sparse Low-Rank Experts Adapters (SLORA), a novel framework that introduces a Fisher-information-guided layer importance scoring mechanism, coupled with Bayesian optimization for task-aware automatic rank allocation. SLORA embeds LoRA modules into a Mixture-of-Experts (MoE) architecture, activating sparse low-rank experts only in critical layers. Evaluated across multiple models (Llama, Qwen) and benchmarks (Alpaca, MT-Bench), SLORA reduces trainable parameters by 37–52%, decreases GPU memory consumption by 41–49%, and cuts inference latency by 33%, while maintaining or even improving downstream task performance. The method significantly enhances the efficiency–accuracy trade-off, enabling resource-efficient adaptation for constrained deployment scenarios.

Technology Category

Application Category

📝 Abstract
Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a widely adopted strategy for adapting pre-trained Large Language Models (LLMs) to downstream tasks, significantly reducing memory and computational costs. However, most existing PEFT techniques uniformly deploy LoRA adapters across all layers, disregarding the intrinsic heterogeneity of layer contributions and task-specific rank requirements. This uniform paradigm leads to redundant parameter allocation and suboptimal adaptation efficiency. To address these limitations, we propose FLoE, a novel PEFT framework that introduces two key innovations: (i) a Fisher information-guided importance scoring mechanism to dynamically identify task-critical transformer layers for MoE-based low-rank adaptation, enabling sparse adapter deployment; and (ii) a Bayesian optimization-driven rank allocator that automatically determines optimal LoRA ranks on specific datasets without exhaustive grid search. Extensive experiments across diverse LLMs and benchmarks reveal that FLoE achieves impressive efficiency-accuracy trade-offs, making FLoE particularly advantageous in resource-constrained environments that necessitate rapid adaptation.
Problem

Research questions and friction points this paper is trying to address.

Uniform LoRA adapters ignore layer heterogeneity and task-specific rank needs
Redundant parameter allocation reduces adaptation efficiency in PEFT methods
FLoE dynamically selects critical layers and optimizes ranks for efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fisher-guided layer selection for sparse adaptation
Bayesian optimization for optimal rank allocation
MoE-based low-rank adaptation for efficiency
🔎 Similar Papers
X
Xinyi Wang
Zhejiang University
Lirong Gao
Lirong Gao
Zhejiang University
LLMs
Haobo Wang
Haobo Wang
Zhejiang University
Machine Learning
Y
Yiming Zhang
Zhejiang University
J
Junbo Zhao
Zhejiang University