GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR

📅 2026-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing parameter-efficient fine-tuning methods in reinforcement learning with verifiable rewards (RLVR) often neglect the geometric structure of optimization dynamics, leading to spectral collapse and training instability. This work proposes a geometry-aware low-rank adaptation method that, for the first time, incorporates the anisotropy and compressibility of RLVR update subspaces into adapter design. By leveraging singular value decomposition (SVD), the approach initializes adapters along dominant directions within a geometrically constrained subspace and freezes residual components, thereby balancing optimization stability and computational efficiency. Evaluated on Qwen and Llama models for mathematical reasoning tasks, the method achieves state-of-the-art performance, significantly outperforming existing low-rank approaches while demonstrating superior generalization and robustness against catastrophic forgetting.

Technology Category

Application Category

📝 Abstract
Reinforcement Learning with Verifiable Rewards (RLVR) is crucial for advancing large-scale reasoning models. However, existing parameter-efficient methods, such as PiSSA and MiLoRA, are designed for Supervised Fine-Tuning (SFT) and do not account for the distinct optimization dynamics and geometric structures of RLVR. Applying these methods directly leads to spectral collapse and optimization instability, which severely limit model performance. Meanwhile, alternative approaches that leverage update sparsity encounter significant efficiency bottlenecks on modern hardware due to unstructured computations. To address these challenges, we propose GeoRA (Geometry-Aware Low-Rank Adaptation), which exploits the anisotropic and compressible nature of RL update subspaces. GeoRA initializes adapters by extracting principal directions via Singular Value Decomposition (SVD) within a geometrically constrained subspace while freezing the residual components. This method preserves the pre-trained geometric structure and enables efficient GPU computation through dense operators. Experiments on Qwen and Llama demonstrate that GeoRA mitigates optimization bottlenecks caused by geometric misalignment. It consistently outperforms established low-rank baselines on key mathematical benchmarks, achieving state-of-the-art (SOTA) results. Moreover, GeoRA shows superior generalization and resilience to catastrophic forgetting in out-of-domain tasks.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning with Verifiable Rewards
parameter-efficient adaptation
optimization instability
spectral collapse
geometric structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-Aware Adaptation
Low-Rank Adaptation
Reinforcement Learning with Verifiable Rewards
Singular Value Decomposition
Optimization Stability
🔎 Similar Papers
No similar papers found.
J
Jiaying Zhang
Meituan
L
Lei Shi
Meituan
Jiguo Li
Jiguo Li
Professor of Computer Science, Fujian Normal University
cryptography theory and technologycryptography protocolnetwork securityauthenticationcould computing security
J
Jun Xu
Meituan
J
Jiuchong Gao
Meituan
J
Jinghua Hao
Meituan
R
Renqing He
Meituan