GAIN: Multiplicative Modulation for Domain Adaptation

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses catastrophic forgetting in large language models during continual domain adaptation, which often arises from introducing new weight directions. Inspired by neural gain modulation in neuroscience, the authors propose GAIN—a lightweight method that introduces learnable diagonal matrices $ S $ in the attention output projection and feed-forward networks. Rather than adding new directions, GAIN multiplicatively rescales pre-existing weights ($ W_{\text{new}} = S \odot W $) to reweight features. Requiring only 46K–230K additional parameters, GAIN can be fused into pretrained weights, incurring zero inference overhead and maintaining compatibility across architectures. Evaluated on eight sequential domain tasks, GAIN-FFN substantially mitigates forgetting: it achieves 7–13% lower validation perplexity than LoRA and reduces BoolQ accuracy drop to merely 0.8%, compared to LoRA’s 14.9%.

Technology Category

Application Category

📝 Abstract

Adapting LLMs to new domains causes forgetting because standard methods (full fine-tuning, LoRA) inject new directions into the weight space. We propose GAIN, which re-emphasizes existing features through multiplicative modulation W_new = S * W. The learned diagonal matrix S is applied to the attention output projection and optionally the FFN. The principle mirrors gain modulation in neuroscience, where neurons adapt to context by scaling response strength while preserving selectivity. We evaluate GAIN on five models from four families (774M to 70B), adapting sequentially across eight domains. GAIN-FFN matches LoRA's in-domain adaptation, but their effects on previously trained domains are opposite: GAIN-FFN improves them by 7-13% (validation PPL), while LoRA degrades them by 18-36%. Downstream accuracy confirms the pattern: for example, after seven sequential adaptations on Qwen2.5, GAIN-FFN degrades BoolQ by only 0.8% while LoRA damages it by 14.9%. GAIN adds 46K-230K parameters per model and can be absorbed into the pretrained weights for zero inference cost.

Problem

Research questions and friction points this paper is trying to address.

domain adaptation

catastrophic forgetting

large language models

sequential learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

multiplicative modulation

domain adaptation

catastrophic forgetting

gain modulation