GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR

📅 2026-01-14

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing parameter-efficient fine-tuning methods in reinforcement learning with verifiable rewards (RLVR) often neglect the geometric structure of optimization dynamics, leading to spectral collapse and training instability. This work proposes a geometry-aware low-rank adaptation method that, for the first time, incorporates the anisotropy and compressibility of RLVR update subspaces into adapter design. By leveraging singular value decomposition (SVD), the approach initializes adapters along dominant directions within a geometrically constrained subspace and freezes residual components, thereby balancing optimization stability and computational efficiency. Evaluated on Qwen and Llama models for mathematical reasoning tasks, the method achieves state-of-the-art performance, significantly outperforming existing low-rank approaches while demonstrating superior generalization and robustness against catastrophic forgetting.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) is crucial for advancing large-scale reasoning models. However, existing parameter-efficient methods, such as PiSSA and MiLoRA, are designed for Supervised Fine-Tuning (SFT) and do not account for the distinct optimization dynamics and geometric structures of RLVR. Applying these methods directly leads to spectral collapse and optimization instability, which severely limit model performance. Meanwhile, alternative approaches that leverage update sparsity encounter significant efficiency bottlenecks on modern hardware due to unstructured computations. To address these challenges, we propose GeoRA (Geometry-Aware Low-Rank Adaptation), which exploits the anisotropic and compressible nature of RL update subspaces. GeoRA initializes adapters by extracting principal directions via Singular Value Decomposition (SVD) within a geometrically constrained subspace while freezing the residual components. This method preserves the pre-trained geometric structure and enables efficient GPU computation through dense operators. Experiments on Qwen and Llama demonstrate that GeoRA mitigates optimization bottlenecks caused by geometric misalignment. It consistently outperforms established low-rank baselines on key mathematical benchmarks, achieving state-of-the-art (SOTA) results. Moreover, GeoRA shows superior generalization and resilience to catastrophic forgetting in out-of-domain tasks.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning with Verifiable Rewards

parameter-efficient adaptation

optimization instability

spectral collapse

geometric structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-Aware Adaptation

Low-Rank Adaptation

Reinforcement Learning with Verifiable Rewards