HyperAdapt: Simple High-Rank Adaptation

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

To address the prohibitive computational cost of full-parameter fine-tuning for large language models and the performance bottlenecks of existing parameter-efficient fine-tuning (PEFT) methods—particularly their inherent rank limitations—this paper proposes Row-Column Diagonal Scaling (RCDS). RCDS introduces learnable diagonal scaling matrices applied independently to the rows and columns of pretrained weight matrices, enabling high-rank adaptation of an $n imes m$ matrix with only $n + m$ trainable parameters. Theoretically, RCDS achieves a significantly higher upper bound on update rank than Low-Rank Adaptation (LoRA). Empirically, on GLUE, arithmetic reasoning, and commonsense reasoning benchmarks, RCDS matches or approaches the performance of full fine-tuning and state-of-the-art PEFT methods on a 14B-parameter model, while reducing trainable parameters by one to three orders of magnitude—demonstrating both exceptional parameter efficiency and strong representational capacity.

Technology Category

Application Category

📝 Abstract

Foundation models excel across diverse tasks, but adapting them to specialized applications often requires fine-tuning, an approach that is memory and compute-intensive. Parameter-efficient fine-tuning (PEFT) methods mitigate this by updating only a small subset of weights. In this paper, we introduce HyperAdapt, a parameter-efficient fine-tuning method that significantly reduces the number of trainable parameters compared to state-of-the-art methods like LoRA. Specifically, HyperAdapt adapts a pre-trained weight matrix by applying row- and column-wise scaling through diagonal matrices, thereby inducing a high-rank update while requiring only $n+m$ trainable parameters for an $n imes m$ matrix. Theoretically, we establish an upper bound on the rank of HyperAdapt's updates, and empirically, we confirm that it consistently induces high-rank transformations across model layers. Experiments on GLUE, arithmetic reasoning, and commonsense reasoning benchmarks with models up to 14B parameters demonstrate that HyperAdapt matches or nearly matches the performance of full fine-tuning and state-of-the-art PEFT methods while using orders of magnitude fewer trainable parameters.

Problem

Research questions and friction points this paper is trying to address.

Adapting foundation models efficiently with minimal parameters

Achieving high-rank updates using simple diagonal scaling matrices

Matching full fine-tuning performance while reducing computational costs

Innovation

Methods, ideas, or system contributions that make the work stand out.

HyperAdapt uses diagonal matrices for row-column scaling

It achieves high-rank updates with minimal parameters

Method reduces parameters while matching full fine-tuning performance

🔎 Similar Papers

No similar papers found.