Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing parameter-efficient fine-tuning (PEFT) methods for merging multimodal large language models (MLLMs) suffer from two critical limitations: degraded multi-task performance and risk of original data leakage. To address these, we propose CoPA-Merging—the first training-free, cross-task generalizable, and data-leakage-free MLLM merging framework. Its core insight is the first identification of *direction preservation* and *singular value compensation* in low-rank decomposition as essential for robust model merging. Guided by this, we design a Complementary Parameter Adaptation mechanism that jointly mitigates task interference and enhances generalization to unseen tasks. The method integrates parameter pruning, relation-driven scaling coefficient construction, and cross-task normalization. Evaluated on a newly established multimodal multi-task benchmark, CoPA-Merging achieves an average 12.3% performance gain over state-of-the-art merging approaches, significantly improving generalization—without accessing original training data or requiring any additional optimization.

Technology Category

Application Category

📝 Abstract
Fine-tuning pre-trained models with custom data leads to numerous expert models on specific tasks. Merging models into one universal model to empower multi-task ability refraining from data leakage has gained popularity. With the expansion in data and model size, parameter efficient tuning becomes the common practice for obtaining task-specific models efficiently. However, we observe that existing methods designed for full fine-tuning merging fail under efficient tuning. To address the issues, we analyze from low-rank decomposition and reveal that maintaining direction and compensating for gap between singular values are crucial for efficient model merging. Consequently, we propose CoPA-Merging, a training-free parameter efficient merging method with complementary parameter adaptation. Specifically, we (1) prune parameters and construct scaling coefficients from inter-parameter relation to compensate for performance drop from task interference and (2) perform cross-task normalization to enhance unseen task generalization. We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certificate the outstanding performance and generalizability of our method. Additional study and extensive analyses further showcase the effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Efficient merging of multimodal large language models
Addressing task interference in model merging
Enhancing generalization for unseen tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter efficient tuning
Complementary parameter adaptation
Cross-task normalization
🔎 Similar Papers
No similar papers found.
Fanhu Zeng
Fanhu Zeng
Institute of Automation, Chinese Academy of Sciences
Multimodal LLMTrustworthy AIEfficient Learning
Haiyang Guo
Haiyang Guo
Institute of Automation, Chinese Academy of Sciences
Continual LearningMultimodal LearningPattern Recognition
F
Fei Zhu
Centre for Artificial Intelligence and Robotics, HKISI-CAS
L
Li Shen
School of Cyber Science and Technology, Sun Yat-sen University
H
Hao Tang
School of Computer Science, Peking University