🤖 AI Summary
Existing parameter-efficient fine-tuning (PEFT) methods struggle to jointly optimize quantization and adaptation for large language models (LLMs): low-rank adapters suffer from limited representational capacity, while Fourier-based adapters—though expressive—exhibit poor quantization robustness and high computational overhead. To address this, we propose a novel PEFT framework based on the Walsh–Hadamard Transform (WHT), replacing the conventional Fourier transform. Our method introduces an adaptive parameter selection mechanism and a value-optimized initialization strategy, significantly enhancing both representation capability and robustness under quantization-aware fine-tuning. Experiments demonstrate that our approach achieves superior accuracy over state-of-the-art PEFT baselines under 2–4-bit quantization, accelerates training by 2.1× compared to Fourier adapters, and simultaneously attains high accuracy, low computational cost, and strong generalization across diverse tasks and model scales.
📝 Abstract
The demand for efficient deployment of large language models (LLMs) has driven interest in quantization, which reduces inference cost, and parameter-efficient fine-tuning (PEFT), which lowers training overhead. This motivated the development of quantization-aware PEFT to produce accurate yet efficient quantized models. In this setting, reducing quantization error prior to fine-tuning is crucial for achieving high model accuracy. However, existing methods that rely on low-rank adaptation suffer from limited representational capacity. Recent Fourier-related transform (FT)-based adapters offer greater representational power than low-rank adapters, but their direct integration into quantized models often results in ineffective error reduction and increased computational overhead. To overcome these limitations, we propose QWHA, a method that integrates FT-based adapters into quantized models by employing the Walsh-Hadamard Transform (WHT) as the transform kernel, together with a novel adapter initialization scheme incorporating adaptive parameter selection and value refinement. We demonstrate that QWHA effectively mitigates quantization errors while facilitating fine-tuning, and that its design substantially reduces computational cost. Experimental results show that QWHA consistently outperforms baselines in low-bit quantization accuracy and achieves significant training speedups over existing FT-based adapters. The code is available at https://github.com/vantaa89/qwha.