🤖 AI Summary
This work proposes D2-LoRA, a parameter-efficient fine-tuning method designed for scenarios with limited training data and computational resources, where existing approaches often struggle to balance performance, stability, and mergeability. D2-LoRA integrates signed low-rank residual updates that encode both directional and differential information, complemented by column-wise norm projection to constrain weight updates. This yields an adapter architecture that is highly accurate, exhibits low training variance, and supports algebraic merging without introducing inference latency. Evaluated across eight question answering and reading comprehension benchmarks, D2-LoRA achieves an average accuracy of 76.4%, outperforming LoRA by 2.2 percentage points, while reducing training instability by 36%. The merged model demonstrates a 1.91× increase in inference throughput with negligible numerical degradation of approximately 0.03 percentage points.
📝 Abstract
We systematically investigate the parameter-efficient fine-tuning design space under practical data and compute constraints, and propose D2-LoRA. D2-LoRA achieves 76.4 percent average accuracy across eight question answering and reading comprehension benchmarks using only 5k training samples per task and two epochs, while preserving algebraic mergeability at inference with near-exact numerical equivalence. The method combines signed low-rank residual updates with additive and subtractive components, together with a train-time column-wise projection that keeps each column close to its original norm. After training, the adapter is merged into a single weight matrix, adding zero inference latency. Compared with LoRA, D2-LoRA improves average accuracy by 2.2 percentage points; at matched parameter counts (LoRA rank 2r versus D2-LoRA rank r), the improvement is 1.6 points, indicating gains from architectural design rather than increased parameterization. Compared with DoRA, it matches or exceeds performance on most tasks. Beyond QA and reading comprehension, D2-LoRA improves generative tasks (plus 1.2 ROUGE-L and plus 1.1 percent win rate) and shows 36 percent lower training volatility. The merge preserves numerical fidelity (mean gap about 0.03 percentage points) and recovers about 1.91x evaluation throughput. Training overhead is 19 percent, comparable to DoRA, and decreases with longer input sequences. We provide a geometric analysis explaining how the projection stabilizes training, together with ablation studies isolating the contribution of each design component.