1LoRA: Summation Compression for Very Low-Rank Adaptation

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of parameter-efficient fine-tuning (PEFT) for large language models (LLMs) under extremely low-rank constraints, this paper proposes 1LoRA: a novel method that introduces only a single learnable vector per linear layer as a decompressor, coupled with feature summation to achieve fixed-ratio compression—reducing trainable parameters per layer to the theoretical minimum (rank = 1). 1LoRA is the first approach to enable full-layer, single-vector decomposition in the ultra-low-rank regime, departing from conventional matrix-decomposition-based PEFT paradigms (e.g., LoRA) that require higher ranks. It supports balanced, end-to-end fine-tuning across the entire network—not restricted to attention layers. Extensive experiments demonstrate that 1LoRA consistently outperforms state-of-the-art methods—including LoRA, VeRA, and MoRA—across multiple downstream tasks. Remarkably, it achieves superior performance while significantly reducing trainable parameters, GPU memory footprint, and computational overhead.

Technology Category

Application Category

📝 Abstract
Parameter-Efficient Fine-Tuning (PEFT) methods have transformed the approach to fine-tuning large models for downstream tasks by enabling the adjustment of significantly fewer parameters than those in the original model matrices. In this work, we study the"very low rank regime", where we fine-tune the lowest amount of parameters per linear layer for each considered PEFT method. We propose 1LoRA (Summation Low-Rank Adaptation), a compute, parameter and memory efficient fine-tuning method which uses the feature sum as fixed compression and a single trainable vector as decompression. Differently from state-of-the-art PEFT methods like LoRA, VeRA, and the recent MoRA, 1LoRA uses fewer parameters per layer, reducing the memory footprint and the computational cost. We extensively evaluate our method against state-of-the-art PEFT methods on multiple fine-tuning tasks, and show that our method not only outperforms them, but is also more parameter, memory and computationally efficient. Moreover, thanks to its memory efficiency, 1LoRA allows to fine-tune more evenly across layers, instead of focusing on specific ones (e.g. attention layers), improving performance further.
Problem

Research questions and friction points this paper is trying to address.

Develops a parameter-efficient fine-tuning method for large models
Reduces memory and computational costs in model fine-tuning
Improves performance by enabling even fine-tuning across layers
Innovation

Methods, ideas, or system contributions that make the work stand out.

1LoRA uses summation for fixed compression.
Single trainable vector enables efficient decompression.
Reduces parameters, memory, and computational costs.
🔎 Similar Papers
No similar papers found.