RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format

πŸ“… 2026-02-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
While large reasoning models exhibit strong multi-step reasoning capabilities, they often struggle to adhere to instruction constraints such as output formatting. To address this limitation, this work proposes RAIN-Merging, a gradient-free model merging method that preserves the original model’s structured reasoning abilities through task vector analysis and null-space projection, while enhancing instruction following via an instruction-aware attention mechanism that enables module-level adaptive scaling. This approach significantly improves instruction adherence without compromising the underlying reasoning mechanisms. Experimental results demonstrate consistent gains across four instruction-following benchmarks, while maintaining or even improving performance on nine reasoning and general capability benchmarks. The effectiveness of RAIN-Merging is robust across diverse model scales and architectures.

Technology Category

Application Category

πŸ“ Abstract
Large reasoning models (LRMs) excel at a long chain of reasoning but often fail to faithfully follow instructions regarding output format, constraints, or specific requirements. We investigate whether this gap can be closed by integrating an instruction-tuned model (ITM) into an LRM. Analyzing their differences in parameter space, namely task vectors, we find that their principal subspaces are nearly orthogonal across key modules, suggesting a lightweight merging with minimal interference. However, we also demonstrate that naive merges are fragile because they overlook the output format mismatch between LRMs (with explicit thinking and response segments) and ITMs (answers-only). We introduce RAIN-Merging (Reasoning-Aware Instruction-attention guided Null-space projection Merging), a gradient-free method that integrates instruction following while preserving thinking format and reasoning performance. First, with a small reasoning calibration set, we project the ITM task vector onto the null space of forward features at thinking special tokens, which preserves the LRM's structured reasoning mechanisms. Second, using a small instruction calibration set, we estimate instruction attention to derive module-specific scaling that amplifies instruction-relevant components and suppresses leakage. Across four instruction-following benchmarks and nine reasoning & general capability benchmarks, RAIN-Merging substantially improves instruction adherence while maintaining reasoning quality. The gains are consistent across model scales and architectures, translating to improved performance in agent settings.
Problem

Research questions and friction points this paper is trying to address.

instruction following
large reasoning models
output format
reasoning format preservation
instruction adherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

gradient-free merging
instruction following
reasoning format preservation
null-space projection
task vector alignment
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhehao Huang
Institute of Image Processing and Pattern Recognition, School of Automation and Intelligent Sensing, Shanghai Jiao Tong University
Yuhang Liu
Yuhang Liu
The University of Adelaide
Representation LearningLLMsLatent Variable ModelsResponsible AI
Baijiong Lin
Baijiong Lin
Ph.D. Student, The Hong Kong University of Science and Technology (Guangzhou)
RLVRLLM Post-TrainingMulti-Task Learning
Y
Yixin Lou
Institute of Image Processing and Pattern Recognition, School of Automation and Intelligent Sensing, Shanghai Jiao Tong University
Z
Zhengbao He
Institute of Image Processing and Pattern Recognition, School of Automation and Intelligent Sensing, Shanghai Jiao Tong University
H
Hanling Tian
Institute of Image Processing and Pattern Recognition, School of Automation and Intelligent Sensing, Shanghai Jiao Tong University
Tao Li
Tao Li
Shanghai Jiao Tong University
machine learningoptimization
Xiaolin Huang
Xiaolin Huang
Professor, Shanghai Jiao Tong University
machine learningkernel methoddeep neural network trainingpiecewise linear model