Compressing Sine-Activated Low-Rank Adapters through Post-Training Quantization

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the significant performance degradation caused by post-training quantization (PTQ) of LoRA adapters, this paper introduces sinusoidal nonlinear activation into quantized LoRA design—enhancing expressivity and rank stability at low bit-widths without increasing parameter count. We theoretically prove that the sinusoidal transformation tightly preserves the stable rank of adapters before and after quantization; integrating low-rank decomposition with PTQ enables joint optimization of model compression and accuracy. Extensive experiments across language modeling, vision, and text-to-image generation tasks demonstrate that our method incurs <0.3% accuracy loss under 4-bit quantization, achieves over 60% memory reduction, and matches full-precision LoRA performance. The core contribution is a novel sinusoidal activation quantization paradigm, providing a high-fidelity, low-bit compression framework for parameter-efficient fine-tuning.

Technology Category

Application Category

📝 Abstract

Low-Rank Adaptation (LoRA) has become a standard approach for parameter-efficient fine-tuning, offering substantial reductions in trainable parameters by modeling updates as the product of two low-rank matrices. While effective, the low-rank constraint inherently limits representational capacity, often resulting in reduced performance compared to full-rank fine-tuning. Recent work by Ji et al. (2025) has addressed this limitation by applying a fixed-frequency sinusoidal transformation to low-rank adapters, increasing their stable rank without introducing additional parameters. This raises a crucial question: can the same sine-activated technique be successfully applied within the context of Post-Training Quantization to retain benefits even after model compression? In this paper, we investigate this question by extending the sinusoidal transformation framework to quantized LoRA adapters. We develop a theoretical analysis showing that the stable rank of a quantized adapter is tightly linked to that of its full-precision counterpart, motivating the use of such rank-enhancing functions even under quantization. Our results demonstrate that the expressivity gains from a sinusoidal non-linearity persist after quantization, yielding highly compressed adapters with negligible loss in performance. We validate our approach across a range of fine-tuning tasks for language, vision and text-to-image generation achieving significant memory savings while maintaining competitive accuracy.

Problem

Research questions and friction points this paper is trying to address.

Extending sinusoidal transformation to quantized LoRA adapters

Retaining expressivity gains after post-training quantization

Achieving memory savings without performance loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sine-activated low-rank adapters enhance representational capacity

Post-training quantization retains performance with sine activation

Theoretical link between full-precision and quantized adapter ranks

🔎 Similar Papers

No similar papers found.

Authors to Follow