🤖 AI Summary
To address performance bottlenecks in LoRA fine-tuning caused by fixed rank assignment and static initialization, this paper proposes a gradient-driven paradigm for dynamic rank allocation and joint adapter initialization. It is the first to jointly optimize rank estimation and weight initialization: layer-wise ranks are dynamically allocated based on gradient sensitivity, while adapter weights are initialized via adaptive SVD and refined through a lightweight parameter reweighting mechanism—preserving LoRA’s efficiency while substantially enhancing representational capacity. The method is architecture-agnostic, supporting mainstream models including T5 and Llama3.1. Experiments demonstrate significant gains: +5.88 points over standard LoRA on GLUE, matching full fine-tuning; +5.13 points on GSM8K, and—under high-rank configurations—outperforming full fine-tuning by 2.05 points. These results break the traditional efficiency–effectiveness trade-off in parameter-efficient adaptation.
📝 Abstract
Low-Rank Adaptation (LoRA) is a crucial method for efficiently fine-tuning pretrained large language models (LLMs), with its performance largely influenced by two key factors: rank and initialization strategy. Numerous LoRA variants have been proposed to enhance its performance by addressing these factors. However, these variants often compromise LoRA's usability or efficiency. In this paper, we analyze the fundamental limitations of existing methods and introduce a novel approach, GoRA (Gradient-driven Adaptive Low Rank Adaptation), which adaptively assigns ranks and initializes weights for low-rank adapters simultaneously based on gradient information. Extensive experimental results demonstrate that GoRA significantly improves performance while preserving the high usability and efficiency of LoRA. On the T5 model fine-tuned for the GLUE benchmark, GoRA achieves a 5.88-point improvement over LoRA and slightly surpasses full fine-tuning. Similarly, on the Llama3.1-8B-Base model fine-tuned for GSM8k tasks, GoRA outperforms LoRA with a 5.13-point improvement and exceeds full fine-tuning in high-rank settings by a margin of 2.05 points.