Hybrid and Unitary Fine-Tuning of Large Language Models: Methods and Benchmarking under Resource Constraints

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high computational cost, gradient instability, and excessive memory consumption in large language model (LLM) fine-tuning, this paper proposes a novel hybrid parameter-efficient fine-tuning (PEFT) method. Our approach integrates the orthogonality-enforced stability of BOFT with the gradient-aligned convergence of LoRA-GA, and—uniquely—incorporates a unitary RNN mechanism into the Transformer architecture to enhance gradient propagation stability. We further design a gradient-norm-based dynamic hierarchical update scheme and introduce unit-norm constraints into PEFT optimization for the first time. Evaluated across model scales from 7B to 405B, our method consistently outperforms existing PEFT baselines on four major benchmarks—GLUE, GSM8K, MT-Bench, and HumanEval—achieving accuracy close to full fine-tuning while accelerating training by 2.1× and reducing GPU memory usage by 50%.

Technology Category

Application Category

📝 Abstract
Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT's orthogonal stability with LoRA-GA's gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Empirical evaluations on four benchmarks -- GLUE, GSM8K, MT-Bench, and HumanEval -- using models ranging from 7B to 405B parameters demonstrate that our hybrid method consistently outperforms individual PEFT baselines, approaching full fine-tuning accuracy while reducing resource consumption by up to 2.1 times in training time and 50 percent in memory usage. These findings establish the hybrid approach as a practical and scalable fine-tuning solution for real-world deployment of LLMs under resource constraints.
Problem

Research questions and friction points this paper is trying to address.

Efficient fine-tuning of large language models under resource constraints
Dynamic hybrid strategy combining BOFT and LoRA-GA for better convergence
Adapting unitary RNN principles to enhance transformer-based LLM stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid BOFT and LoRA-GA for efficient fine-tuning
Adaptive per-layer updates via gradient norms
Unitary RNN principles for transformer stability
🔎 Similar Papers
No similar papers found.
Haomin Qi
Haomin Qi
University of California, San Diego
Generative AIDeep LearningNatural Language Processing
Z
Zihan Dai
Information Engineering, The Chinese University of Hong Kong, Hong Kong
C
Chengbo Huang
Information Engineering, The Chinese University of Hong Kong, Hong Kong