RoCoFT: Efficient Finetuning of Large Language Models with Row-Column Updates

📅 2024-10-14

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

To address the high memory and computational overhead in parameter-efficient fine-tuning (PEFT) of large language models (LLMs), this paper proposes Row-Column Sparse updating (RC-Sparse), which optimizes only a small subset of rows and columns in Transformer weight matrices—introducing structured sparsity to PEFT for the first time. Grounded in Neural Tangent Kernel (NTK) theory, we prove that RC-Sparse achieves high-fidelity approximation of the full-parameter fine-tuning kernel. We systematically design row/column selection strategies and a rank-adaptive mechanism, validating the method across diverse architectures—including BERT, RoBERTa, Bloom, and Llama2. Experiments on mainstream benchmarks show that RC-Sparse matches or exceeds the accuracy of LoRA and Adapter baselines while reducing GPU memory consumption by 30–50% and accelerating training by 1.5–2×. Moreover, it demonstrates strong scalability on 7B- and 13B-parameter models.

Technology Category

Application Category

📝 Abstract

We propose RoCoFT, a parameter-efficient fine-tuning method for large-scale language models (LMs) based on updating only a few rows and columns of the weight matrices in transformers. Through extensive experiments with medium-size LMs like BERT and RoBERTa, and larger LMs like Bloom-7B, Llama2-7B, and Llama2-13B, we show that our method gives comparable or better accuracies than state-of-art PEFT methods while also being more memory and computation-efficient. We also study the reason behind the effectiveness of our method with tools from neural tangent kernel theory. We empirically demonstrate that our kernel, constructed using a restricted set of row and column parameters, are numerically close to the full-parameter kernel and gives comparable classification performance. Ablation studies are conducted to investigate the impact of different algorithmic choices, including the selection strategy for rows and columns as well as the optimal rank for effective implementation of our method.

Problem

Research questions and friction points this paper is trying to address.

Efficient fine-tuning of large language models with minimal parameter updates

Achieving high accuracy with reduced memory and computational costs

Analyzing effectiveness using neural tangent kernel theory

Innovation

Methods, ideas, or system contributions that make the work stand out.

Row-column updates for efficient fine-tuning

Memory and computation-efficient PEFT method

Neural tangent kernel theory analysis

🔎 Similar Papers

Balancing Speciality and Versatility: A Coarse to Fine Framework for Mitigating Catastrophic Forgetting in Large Language Models