Multilingual Fine-Tuning via Localized Gradient Conflict Resolution

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This work addresses the issue of negative interference across languages during fine-tuning of multilingual large language models, which often leads to performance degradation. The authors formulate this challenge as a multi-objective optimization problem and propose a bucket-level multi-objective optimization framework. By leveraging a distributed parameter bucket mechanism, the method resolves gradient conflicts locally, enabling efficient and conflict-aware parameter updates. Theoretically, this approach satisfies a stricter Pareto stationarity condition and encourages the model to learn language-specific representations, thereby enhancing multilingual separability. Experimental results demonstrate that the proposed method consistently outperforms standard fine-tuning across four prominent large language models, achieving significant improvements on both seen and unseen languages in downstream tasks.
📝 Abstract
The rapid evolution of Large Language Models (LLMs) has established cross-lingual versatility as a defining feature of modern systems. However, fine-tuning these models frequently induces negative interference across languages. To address this, we reformulate multilingual fine-tuning as a multi-objective optimization (MOO) problem. Specifically, we introduce Bucket-Level MOO, a scalable distributed framework that applies gradient-based MOO algorithms locally on parameter buckets. This enables conflict-aware updates without the prohibitive communication overhead of reconstructing full gradient vectors. Theoretically, we prove this localized resolution natively enforces Refined Pareto Stationarity, a strictly tighter necessary condition for Pareto optimality. Empirically, Bucket-Level MOO mitigates interference by driving LLMs to construct distinct language-specific dimensions, improving representational separability. Extensive experiments across four base LLMs demonstrate that our method significantly improves both seen and unseen multilingual performance over standard fine-tuning paradigms.
Problem

Research questions and friction points this paper is trying to address.

multilingual fine-tuning
negative interference
cross-lingual versatility
gradient conflict
language interference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bucket-Level MOO
multilingual fine-tuning
gradient conflict resolution
multi-objective optimization
Refined Pareto Stationarity
🔎 Similar Papers
No similar papers found.