Sparse Matrix in Large Language Model Fine-tuning

📅 2024-05-24

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

212K/year

🤖 AI Summary

To bridge the persistent accuracy gap between parameter-efficient fine-tuning (PEFT) and full fine-tuning (FT), this paper proposes Sparse Matrix Tuning (SMT), a gradient-driven structured sparsity method. SMT identifies and updates only the most informative submatrices within weight gradients, integrating low-rank approximation with the LLaMA adaptation framework to substantially reduce computational and memory overhead while enhancing generalization stability. Its core contribution lies in the first systematic analysis of the root causes underlying the PEFT–FT accuracy gap, coupled with the design of a gradient-sensitive structured sparsity mechanism that overcomes the performance saturation bottleneck of conventional PEFT methods as model scale increases. Experiments demonstrate that SMT outperforms state-of-the-art baselines—including LoRA and DoRA—across multiple tasks, reduces GPU memory consumption by 67% relative to FT, and achieves superior and more robust task generalization.

Technology Category

Application Category

📝 Abstract

LoRA and its variants have become popular parameter-efficient fine-tuning (PEFT) methods due to their ability to avoid excessive computational costs. However, an accuracy gap often exists between PEFT methods and full fine-tuning (FT), and this gap has yet to be systematically studied. In this work, we introduce a method for selecting sparse sub-matrices that aim to minimize the performance gap between PEFT vs. full fine-tuning (FT) while also reducing both fine-tuning computational cost and memory cost. Our Sparse Matrix Tuning (SMT) method begins by identifying the most significant sub-matrices in the gradient update, updating only these blocks during the fine-tuning process. In our experiments, we demonstrate that SMT consistently surpasses other PEFT baseline (e.g. LoRA and DoRA) in fine-tuning popular large language models such as LLaMA across a broad spectrum of tasks, while reducing the GPU memory footprint by 67% compared to FT. We also examine how the performance of LoRA and DoRA tends to plateau and decline as the number of trainable parameters increases, in contrast, our SMT method does not suffer from such issue.

Problem

Research questions and friction points this paper is trying to address.

Minimizes accuracy gap between PEFT and full fine-tuning

Reduces computational and memory costs in fine-tuning

Improves performance over LoRA and DoRA in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selects sparse sub-matrices to minimize performance gap

Updates only significant gradient blocks during fine-tuning

Reduces GPU memory footprint by 67% compared to FT

🔎 Similar Papers

No similar papers found.