MoKA: Mixture of Kronecker Adapters

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing low-rank adapters suffer from limited expressivity due to rigid, fixed-rank constraints, hindering their effectiveness on complex tasks. To address this, we propose MoKA—a parameter-efficient fine-tuning method based on *mixture of Kronecker products*. MoKA employs a learnable gating mechanism to dynamically evaluate and combine Kronecker factors, enabling fine-grained, task-adaptive rank allocation. This design preserves extreme parameter efficiency—reducing trainable parameters by up to 27×—while substantially enhancing modeling capacity. Crucially, MoKA relies solely on standard matrix operations, ensuring native compatibility with GPU acceleration and low-bit quantization. Extensive experiments on quantized LLaMA2-7B and LLaMA3-8B models demonstrate that MoKA consistently outperforms state-of-the-art PEFT methods in both accuracy and efficiency, achieving new SOTA performance.

Technology Category

Application Category

📝 Abstract
Parameter-efficient fine-tuning (PEFT) is essential for reducing the computational overhead of large language models (LLMs). Low-rank family adapters are commonly used to control the parameter size efficiently while maintaining the generative power of LLMs. However, their limited expressiveness due to the rank constraint often restricts their performance on complex tasks. We propose Mixture of Kronecker Adapters (MoKA), a new generation of Kronecker adapters that addresses this limitation by modeling weight updates as a mixture of Kronecker products. Our proposed adapter leverages a gating mechanism that measures the importance of each Kronecker factor, enabling more expressive adaptation. Moreover, MoKA enables a rank flexibility that provides a better trade-off between parameter efficiency and accuracy. To ensure hardware efficiency, we reformulate Kronecker computations using standard matrix operations, allowing seamless deployment on GPU-optimized hardware. We conduct extensive experiments on instruction-tuning and commonsense reasoning tasks using low-bit quantized versions of LLaMA2-7B and LLaMA3-8B models. MoKA not only outperforms PEFT baselines, but also reduces the number of trainable parameters up to 27x, achieving state-of-the-art trade-offs between performance and parameter efficiency.
Problem

Research questions and friction points this paper is trying to address.

Enhancing expressiveness of low-rank adapters for complex tasks
Improving trade-off between parameter efficiency and model accuracy
Enabling hardware-efficient deployment of Kronecker adapter mixtures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Kronecker products for expressive adaptation
Gating mechanism for dynamic Kronecker factor importance
Hardware-efficient reformulation using standard matrix operations
🔎 Similar Papers
No similar papers found.
M
Mohammadreza Sadeghi
Huawei Noah’s Ark Lab
M
Mahsa Ghazvini Nejad
Huawei Noah’s Ark Lab
M
MirHamed Jafarzadeh Asl
Huawei Noah’s Ark Lab
Y
Yu Gu
Huawei Noah’s Ark Lab, Department of Mathematics and Statistics, McGill University
Y
Yuanhao Yu
Huawei Noah’s Ark Lab
Masoud Asgharian
Masoud Asgharian
Professor, Dept of Math & Stat, McGill University
StatisticsOR/OptimizationML/DNN/LLM
Vahid Partovi Nia
Vahid Partovi Nia
Huawei Noah's Ark Lab and Ecole Polytechnique de Montreal
high-dimensional datastatistical learningdeep learningedge intelligence