TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing MoE-enhanced LoRA approaches, which assume expert independence and consequently suffer from routing instability and expert dominance. To overcome this, we propose TalkLoRA, a novel framework that introduces a lightweight inter-expert communication mechanism prior to routing, enabling cross-subspace information exchange to produce more robust global routing signals. TalkLoRA is the first to incorporate structured expert interaction within MoE-LoRA architectures, theoretically mitigating perturbation amplification, smoothing routing dynamics, and providing a strict generalization of prior designs. Extensive experiments demonstrate that TalkLoRA consistently outperforms both LoRA and MoE-LoRA across diverse language understanding and generation tasks, achieving higher parameter efficiency and more balanced expert activation.
📝 Abstract
Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of Large Language Models (LLMs), and recent Mixture-of-Experts (MoE) extensions further enhance flexibility by dynamically combining multiple LoRA experts. However, existing MoE-augmented LoRA methods assume that experts operate independently, often leading to unstable routing, expert dominance. In this paper, we propose \textbf{TalkLoRA}, a communication-aware MoELoRA framework that relaxes this independence assumption by introducing expert-level communication prior to routing. TalkLoRA equips low-rank experts with a lightweight Talking Module that enables controlled information exchange across expert subspaces, producing a more robust global signal for routing. Theoretically, we show that expert communication smooths routing dynamics by mitigating perturbation amplification while strictly generalizing existing MoELoRA architectures. Empirically, TalkLoRA consistently outperforms vanilla LoRA and MoELoRA across diverse language understanding and generation tasks, achieving higher parameter efficiency and more balanced expert routing under comparable parameter budgets. These results highlight structured expert communication as a principled and effective enhancement for MoE-based parameter-efficient adaptation. Code is available at https://github.com/why0129/TalkLoRA.
Problem

Research questions and friction points this paper is trying to address.

Low-Rank Adaptation
Mixture of Experts
expert routing
parameter-efficient fine-tuning
expert dominance
Innovation

Methods, ideas, or system contributions that make the work stand out.

TalkLoRA
Mixture of Experts
Low-Rank Adaptation
expert communication
parameter-efficient fine-tuning
🔎 Similar Papers
No similar papers found.