GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

LoRA suffers from structural bottlenecks at high ranks, causing gradient entanglement across input channels, which leads to overfitting and performance saturation—hindering its ability to approximate full fine-tuning (FFT). To address this, we propose Granular Low-Rank Adaptation (GraLoRA), the first sub-block-level low-rank adaptation framework: it partitions the weight matrix into fine-grained blocks and assigns each block an independent low-rank adapter, explicitly decoupling gradient propagation paths. This design incurs virtually zero additional parameters or computational overhead while substantially enhancing representational capacity and FFT approximation fidelity. On HumanEval+, GraLoRA achieves a +8.5% improvement in Pass@1, consistently outperforming LoRA and other PEFT baselines across diverse model scales and rank configurations. Its performance demonstrates strong robustness and scalability.

Technology Category

Application Category

📝 Abstract

Low-Rank Adaptation (LoRA) is a popular method for parameter-efficient fine-tuning (PEFT) of generative models, valued for its simplicity and effectiveness. Despite recent enhancements, LoRA still suffers from a fundamental limitation: overfitting when the bottleneck is widened. It performs best at ranks 32-64, yet its accuracy stagnates or declines at higher ranks, still falling short of full fine-tuning (FFT) performance. We identify the root cause as LoRA's structural bottleneck, which introduces gradient entanglement to the unrelated input channels and distorts gradient propagation. To address this, we introduce a novel structure, Granular Low-Rank Adaptation (GraLoRA) that partitions weight matrices into sub-blocks, each with its own low-rank adapter. With negligible computational or storage cost, GraLoRA overcomes LoRA's limitations, effectively increases the representational capacity, and more closely approximates FFT behavior. Experiments on code generation and commonsense reasoning benchmarks show that GraLoRA consistently outperforms LoRA and other baselines, achieving up to +8.5% absolute gain in Pass@1 on HumanEval+. These improvements hold across model sizes and rank settings, making GraLoRA a scalable and robust solution for PEFT. Code, data, and scripts are available at https://github.com/SqueezeBits/GraLoRA.git

Problem

Research questions and friction points this paper is trying to address.

LoRA overfits when bottleneck is widened

LoRA's structural bottleneck causes gradient entanglement

GraLoRA partitions matrices to improve representational capacity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Partitions weight matrices into sub-blocks

Each sub-block has its own low-rank adapter

Increases representational capacity without extra cost

🔎 Similar Papers

No similar papers found.