Parameter-Efficient Fine-Tuning via Selective Discrete Cosine Transform

📅 2024-10-09

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

To address the challenge of balancing performance against parameter and memory overhead in efficient large-model fine-tuning, this paper proposes sDCTFT—the first parameter-efficient fine-tuning (PEFT) method integrating the Discrete Cosine Transform (DCT). Its core innovation lies in projecting low-rank weight updates into the frequency domain and leveraging DCT’s energy compaction and decorrelation properties to perform hierarchical spectral sparsification—retaining only critical frequency components. Evaluated on instruction tuning of LLaMA3.1-8B, sDCTFT achieves superior accuracy with merely 0.05M trainable parameters—reducing the parameter count by 99.9% relative to LoRA and by 30% relative to FourierFT—while significantly lowering both storage and computational costs.

Technology Category

Application Category

📝 Abstract

In the era of large language models, parameter-efficient fine-tuning (PEFT) has been extensively studied. However, these approaches usually rely on the space domain, which encounters storage challenges especially when handling extensive adaptations or larger models. The frequency domain, in contrast, is more effective in compressing trainable parameters while maintaining the expressive capability. In this paper, we propose a novel Selective Discrete Cosine Transformation (sDCTFT) fine-tuning scheme to push this frontier. Its general idea is to exploit the superior energy compaction and decorrelation properties of DCT to improve both model efficiency and accuracy. Specifically, it projects the weight change from the low-rank adaptation into the discrete cosine space. Then, the weight change is partitioned over different levels of the discrete cosine spectrum, and the most critical frequency components in each partition are selected. Extensive experiments on four benchmark datasets demonstrate the superior accuracy, reduced computational cost, and lower storage requirements of the proposed method over the prior arts. For instance, when performing instruction tuning on the LLaMA3.1-8B model, sDCTFT outperforms LoRA with just 0.05M trainable parameters compared to LoRA's 38.2M, and surpasses FourierFT with 30% less trainable parameters. The source code will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

Improves efficiency and accuracy of large foundation models

Reduces computational complexity and memory requirements

Enhances performance in single and multi-modality tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical cosine projection for adaptation

Low-rank adaptation in discrete cosine space

Critical frequency components selection for efficiency

🔎 Similar Papers

No similar papers found.

Authors to Follow