QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing parameter-efficient fine-tuning (PEFT) methods struggle to jointly optimize quantization and adaptation for large language models (LLMs): low-rank adapters suffer from limited representational capacity, while Fourier-based adapters—though expressive—exhibit poor quantization robustness and high computational overhead. To address this, we propose a novel PEFT framework based on the Walsh–Hadamard Transform (WHT), replacing the conventional Fourier transform. Our method introduces an adaptive parameter selection mechanism and a value-optimized initialization strategy, significantly enhancing both representation capability and robustness under quantization-aware fine-tuning. Experiments demonstrate that our approach achieves superior accuracy over state-of-the-art PEFT baselines under 2–4-bit quantization, accelerates training by 2.1× compared to Fourier adapters, and simultaneously attains high accuracy, low computational cost, and strong generalization across diverse tasks and model scales.

Technology Category

Application Category

📝 Abstract
The demand for efficient deployment of large language models (LLMs) has driven interest in quantization, which reduces inference cost, and parameter-efficient fine-tuning (PEFT), which lowers training overhead. This motivated the development of quantization-aware PEFT to produce accurate yet efficient quantized models. In this setting, reducing quantization error prior to fine-tuning is crucial for achieving high model accuracy. However, existing methods that rely on low-rank adaptation suffer from limited representational capacity. Recent Fourier-related transform (FT)-based adapters offer greater representational power than low-rank adapters, but their direct integration into quantized models often results in ineffective error reduction and increased computational overhead. To overcome these limitations, we propose QWHA, a method that integrates FT-based adapters into quantized models by employing the Walsh-Hadamard Transform (WHT) as the transform kernel, together with a novel adapter initialization scheme incorporating adaptive parameter selection and value refinement. We demonstrate that QWHA effectively mitigates quantization errors while facilitating fine-tuning, and that its design substantially reduces computational cost. Experimental results show that QWHA consistently outperforms baselines in low-bit quantization accuracy and achieves significant training speedups over existing FT-based adapters. The code is available at https://github.com/vantaa89/qwha.
Problem

Research questions and friction points this paper is trying to address.

Reducing quantization error in large language models before fine-tuning
Overcoming limited representational capacity of low-rank adaptation methods
Addressing computational overhead of Fourier-transform based adapters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Walsh-Hadamard Transform for adaptation
Introduces novel adapter initialization with parameter selection
Reduces quantization error and computational cost
🔎 Similar Papers
No similar papers found.