🤖 AI Summary
This work addresses the “linear ceiling” limitation of Low-Rank Adaptation (LoRA) in complex reasoning tasks, where performance saturates rapidly despite increasing rank due to its linear structure. To overcome this, we propose Nonlinear Rank Adaptation (NoRA), which introduces parallel SiLU-gated pathways and structured Dropout alongside the low-rank weight updates. This design effectively activates the tail components of the singular value spectrum, mitigating rank collapse and transcending the expressivity constraints of linear adaptation. Both theoretical analysis and experiments demonstrate that NoRA substantially enhances representational capacity and spectral efficiency: on SlimOrca, NoRA with rank 64 achieves a perplexity (PPL) of 3.89, outperforming LoRA with rank 512 (PPL 3.90); on MathInstruct, NoRA attains PPL 1.97, significantly surpassing LoRA’s saturated performance (PPL 2.07).
📝 Abstract
Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning (PEFT). However, it faces a critical ``linear ceiling'' in complex reasoning tasks: simply increasing the rank yields diminishing returns due to intrinsic linear constraints. We introduce NoRA (Non-linear Rank Adaptation), a weight-level parallel adapter that injects SiLU gating and structural dropout to induce manifold expansion. On the SlimOrca benchmark, NoRA breaks this linear barrier: NoRA remarkably at rank 64 (PPL 3.89) outperforms LoRA at rank 512 (PPL 3.90), demonstrating superior spectral efficiency. This advantage generalizes to mathematical reasoning, where NoRA achieves a perplexity of 1.97 on MathInstruct, significantly surpassing LoRA's saturation point of 2.07. Mechanism analysis via Singular Value Decomposition (SVD) confirms that NoRA activates the dormant tail of the singular value spectrum, effectively preventing the rank collapse observed in linear methods.