Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Multimodal large language models (MLLMs) suffer from catastrophic forgetting during visual instruction tuning, severely degrading foundational language capabilities. To address this, we propose a training-free “Localize-and-Fuse” parameter fusion framework, introducing the first neuron-level Neuron-Fusion strategy. Leveraging parameter sensitivity analysis, it precisely identifies vision-specialized and general-purpose language neurons; capability-preserving fusion is then achieved via weighted interpolation and influence attenuation. Evaluated across 13 language and vision benchmarks, our method consistently outperforms existing fusion approaches. It significantly mitigates context hallucination while retaining strong visual understanding—simultaneously restoring and preserving the original LLM’s grammatical correctness, logical consistency, and instruction-following fidelity. Crucially, this is accomplished at zero training cost, achieving—for the first time—a principled balance between multimodal adaptability and language fidelity without parameter updates.

Technology Category

Application Category

📝 Abstract

Although multimodal large language models (MLLMs) have achieved impressive performance, the multimodal instruction tuning stage often causes catastrophic forgetting of the base LLM's language ability, even in strong models like Llama3. To address this, we propose Locate-then-Merge, a training-free parameter fusion framework that first locates important parameters and then selectively merges them. We further introduce Neuron-Fusion, a neuron-level strategy that preserves the influence of neurons with large parameter shifts--neurons likely responsible for newly acquired visual capabilities--while attenuating the influence of neurons with smaller changes that likely encode general-purpose language skills. This design enables better retention of visual adaptation while mitigating language degradation. Experiments on 13 benchmarks across both language and visual tasks show that Neuron-Fusion consistently outperforms existing model merging methods. Further analysis reveals that our method effectively reduces context hallucination in generation.

Problem

Research questions and friction points this paper is trying to address.

Mitigates catastrophic forgetting in multimodal LLMs

Preserves language skills during visual adaptation

Reduces context hallucination in model generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free parameter fusion framework Locate-then-Merge

Neuron-level strategy Neuron-Fusion for selective merging

Preserves neurons with large shifts, attenuates small changes

🔎 Similar Papers

No similar papers found.