A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work addresses the performance degradation observed in multi-domain reinforcement learning when training on a single domain adversely affects others—a phenomenon inadequately explained by existing theories. The paper introduces a local perturbation theory that, for the first time, elucidates the root cause of cross-domain interference through second-order damage terms and conflicting subspaces along shared computational pathways. Building on this insight, the authors propose a targeted recovery strategy based on low-dimensional subspaces. By analyzing locally sparse updates and employing training-free rollback or brief domain-specific refreshment, selective performance restoration is achieved without full retraining. Experiments demonstrate that after sequential training on Code→Math→QA→CW, a short Re-Math refresh alone elevates Math performance from 57.66 to 66.04 (averaging 66.39), strongly validating the local damage hypothesis.

📝 Abstract

Reinforcement learning (RL) post-training improves large language models (LLMs) on individual domains such as mathematical reasoning, code generation, question answering, and creative writing (CW), but training on one domain often degrades performance on others. Existing explanations based on catastrophic forgetting or global gradient conflict are incomplete: substantial interference can occur even when full-model gradients are nearly orthogonal. We show that single-domain RL produces sparse, small-magnitude parameter edits with weak overlap among top-changed neurons, while different domains still share substantial active computation routes on which update directions determine whether they act synergistically or conflict. Guided by this observation, we prove under a local perturbation model of multi-domain RL that later-domain training harms an earlier domain mainly through a second-order damage term, which under the observed sparse route structure concentrates in a low-dimensional shared conflict subspace. Moreover, a short domain refresh contracts the harmful component on this subspace, enabling selective recovery with limited collateral damage. Consistent with the theory, a brief Re-Math refresh after Code $\rightarrow$ Math $\rightarrow$ QA $\rightarrow$ CW recovers Math from 57.66 to 66.04 while largely preserving performance on the other domains, yielding the best average score of 66.39. Beyond refresh, a training-free rollback on a sparse proxy conflict coordinate set for the Math-QA pair partially restores Math, providing direct proxy-level evidence for localized damage. These results provide a localized mechanistic account of interference and recovery in multi-domain RL.

Problem

Research questions and friction points this paper is trying to address.

cross-domain interference

multi-domain RL

catastrophic forgetting

parameter updates

shared computation routes

Innovation

Methods, ideas, or system contributions that make the work stand out.

local perturbation theory

multi-domain reinforcement learning

cross-domain interference