Parameter-Efficient Fine-Tuning with Circulant and Diagonal Vectors

📅 2025-05-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language model (LLM) fine-tuning faces challenges including high computational and memory overhead, and difficulty in adapting to non-square weight matrices. This paper proposes a parameter-efficient fine-tuning (PEFT) method based on circulant-diagonal decomposition (CDD), the first to introduce CDD into the Fourier domain. It replaces computationally expensive 2D FFTs with efficient 1D FFTs for accelerated frequency-domain operations and introduces a block-wise mechanism for non-square weights, eliminating explicit construction of the weight update matrix. Crucially, the method models arbitrary-shaped low-rank updates without introducing additional trainable parameters, substantially reducing both FLOPs and memory footprint. Experiments across multiple downstream tasks demonstrate performance on par with or superior to state-of-the-art PEFT methods—including LoRA and AdaLoRA—while reducing trainable parameters by up to 90% and peak memory usage by up to 67%.

Technology Category

Application Category

📝 Abstract
Foundation models have achieved tremendous success in different domains. However, their huge computation and storage complexity make these models difficult to fine-tune and also less applicable in practice. Recent study shows training in Fourier domain can be an effective fine-tuning method in terms of both model performance and number of training parameters. In this work, we propose to further reduce the complexity by the factorization through the product of interleaved circulant and diagonal matrices. In addition, we address the case of non-square fine-tuning weights by partitioning the circulant matrix into blocks. Our method avoids the construction of weight change matrix and utilizes 1D fast Fourier transform (FFT) instead of 2D FFT. Experimental results show that our method achieves similar or better performance across various tasks with much less floating-point operations (FLOPs) and the number of trainable parameters.
Problem

Research questions and friction points this paper is trying to address.

Reducing computation and storage complexity in foundation models
Improving efficiency of fine-tuning with circulant and diagonal matrices
Enhancing performance with fewer FLOPs and trainable parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses circulant and diagonal matrices factorization
Partitions circulant matrix for non-square weights
Employs 1D FFT instead of 2D FFT
🔎 Similar Papers