🤖 AI Summary
Existing parameter-efficient fine-tuning (PEFT) methods for merging multimodal large language models (MLLMs) suffer from two critical limitations: degraded multi-task performance and risk of original data leakage. To address these, we propose CoPA-Merging—the first training-free, cross-task generalizable, and data-leakage-free MLLM merging framework. Its core insight is the first identification of *direction preservation* and *singular value compensation* in low-rank decomposition as essential for robust model merging. Guided by this, we design a Complementary Parameter Adaptation mechanism that jointly mitigates task interference and enhances generalization to unseen tasks. The method integrates parameter pruning, relation-driven scaling coefficient construction, and cross-task normalization. Evaluated on a newly established multimodal multi-task benchmark, CoPA-Merging achieves an average 12.3% performance gain over state-of-the-art merging approaches, significantly improving generalization—without accessing original training data or requiring any additional optimization.
📝 Abstract
Fine-tuning pre-trained models with custom data leads to numerous expert models on specific tasks. Merging models into one universal model to empower multi-task ability refraining from data leakage has gained popularity. With the expansion in data and model size, parameter efficient tuning becomes the common practice for obtaining task-specific models efficiently. However, we observe that existing methods designed for full fine-tuning merging fail under efficient tuning. To address the issues, we analyze from low-rank decomposition and reveal that maintaining direction and compensating for gap between singular values are crucial for efficient model merging. Consequently, we propose CoPA-Merging, a training-free parameter efficient merging method with complementary parameter adaptation. Specifically, we (1) prune parameters and construct scaling coefficients from inter-parameter relation to compensate for performance drop from task interference and (2) perform cross-task normalization to enhance unseen task generalization. We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certificate the outstanding performance and generalizability of our method. Additional study and extensive analyses further showcase the effectiveness.