DeFTX: Denoised Sparse Fine-Tuning for Zero-Shot Cross-Lingual Transfer

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

To address zero-shot cross-lingual transfer for extremely low-resource languages, this paper proposes Denoised Sparse Fine-tuning (DSFT). The method introduces singular value decomposition (SVD)-based denoising as a preprocessing step for sparse fine-tuning weight matrices—a novel application that significantly enhances the robustness and cross-lingual composability of sparse masks. DSFT integrates magnitude-based pruning with compositional sparse fine-tuning (SFT), enabling efficient transfer using only labeled data from high-resource languages and unlabeled text from the target language. Evaluated on sentiment classification and natural language inference tasks across the NusaX and AmericasNLI benchmarks, DSFT matches or surpasses both standard SFT and state-of-the-art cross-lingual approaches on multiple extremely low-resource languages. These results demonstrate DSFT’s strong generalization capability and practical efficacy in resource-constrained multilingual settings.

Technology Category

Application Category

📝 Abstract

Effective cross-lingual transfer remains a critical challenge in scaling the benefits of large language models from high-resource to low-resource languages. Towards this goal, prior studies have explored many approaches to combine task knowledge from task-specific data in a (high-resource) source language and language knowledge from unlabeled text in a (low-resource) target language. One notable approach proposed composable sparse fine-tuning (SFT) for cross-lingual transfer that learns task-specific and language-specific sparse masks to select a subset of the pretrained model's parameters that are further fine-tuned. These sparse fine-tuned vectors (SFTs) are subsequently composed with the pretrained model to facilitate zero-shot cross-lingual transfer to a task in a target language, using only task-specific data from a source language. These sparse masks for SFTs were identified using a simple magnitude-based pruning. In our work, we introduce DeFT-X, a novel composable SFT approach that denoises the weight matrices of a pretrained model before magnitude pruning using singular value decomposition, thus yielding more robust SFTs. We evaluate DeFT-X on a diverse set of extremely low-resource languages for sentiment classification (NusaX) and natural language inference (AmericasNLI) and demonstrate that it performs at par or outperforms SFT and other prominent cross-lingual transfer baselines.

Problem

Research questions and friction points this paper is trying to address.

Enhancing cross-lingual transfer for low-resource languages

Improving sparse fine-tuning via denoising pretrained weight matrices

Evaluating performance on sentiment and natural language inference tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Denoised weight matrices using SVD

Composable sparse fine-tuning (SFT)

Magnitude pruning for robust SFTs

🔎 Similar Papers

Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms: Exploring Tuning Strategies