Arcee’s MergeKit: A Toolkit for Merging Large Language Models

📅 2024-03-20

🏛️ Conference on Empirical Methods in Natural Language Processing

📈 Citations: 51

✨ Influential: 4

career value

156K/year

🤖 AI Summary

To address knowledge forgetting, parameter interference, and high training overhead in multi-task large language model (LLM) fusion, this paper proposes a lightweight, hardware-agnostic, zero-training model fusion framework. The framework supports declarative configuration and distributed pipelining, and integrates state-of-the-art parameter merging algorithms—including weighted averaging, TIES-Merging, and DARE—to enable scalable, training-free ensemble of large-scale models. Its core contribution is the first open-source, general-purpose, fine-tuning-free fusion infrastructure that simultaneously preserves task generalization capability and original model performance stability. Evaluated within the Hugging Face ecosystem, the framework has successfully fused thousands of open-source models, yielding multiple state-of-the-art (SOTA) checkpoints. Empirical results demonstrate significant improvements in cross-task robustness and deployment efficiency, validating its effectiveness for practical, large-scale LLM integration.

Technology Category

Application Category

📝 Abstract

The rapid growth of open-source language models provides the opportunity to merge model checkpoints, combining their parameters to improve performance and versatility. Advances in transfer learning have led to numerous task-specific models, which model merging can integrate into powerful multitask models without additional training. MergeKit is an open-source library designed to support this process with an efficient and extensible framework suitable for any hardware. It has facilitated the merging of thousands of models, contributing to some of the world’s most powerful open-source model checkpoints. The library is accessible at: https://github.com/arcee-ai/mergekit.

Problem

Research questions and friction points this paper is trying to address.

Multi-task Learning

Knowledge Consolidation

Transfer Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Merger

Multi-task Learning

Open-source Toolkit

🔎 Similar Papers

Checkpoint Merging via Bayesian Optimization in LLM Pretraining