๐ค AI Summary
To address zero-shot cross-lingual transfer for extremely low-resource languages, this paper proposes Denoised Sparse Fine-tuning (DSFT). The method introduces singular value decomposition (SVD)-based denoising as a preprocessing step for sparse fine-tuning weight matricesโa novel application that significantly enhances the robustness and cross-lingual composability of sparse masks. DSFT integrates magnitude-based pruning with compositional sparse fine-tuning (SFT), enabling efficient transfer using only labeled data from high-resource languages and unlabeled text from the target language. Evaluated on sentiment classification and natural language inference tasks across the NusaX and AmericasNLI benchmarks, DSFT matches or surpasses both standard SFT and state-of-the-art cross-lingual approaches on multiple extremely low-resource languages. These results demonstrate DSFTโs strong generalization capability and practical efficacy in resource-constrained multilingual settings.
๐ Abstract
Effective cross-lingual transfer remains a critical challenge in scaling the benefits of large language models from high-resource to low-resource languages. Towards this goal, prior studies have explored many approaches to combine task knowledge from task-specific data in a (high-resource) source language and language knowledge from unlabeled text in a (low-resource) target language. One notable approach proposed composable sparse fine-tuning (SFT) for cross-lingual transfer that learns task-specific and language-specific sparse masks to select a subset of the pretrained model's parameters that are further fine-tuned. These sparse fine-tuned vectors (SFTs) are subsequently composed with the pretrained model to facilitate zero-shot cross-lingual transfer to a task in a target language, using only task-specific data from a source language. These sparse masks for SFTs were identified using a simple magnitude-based pruning. In our work, we introduce DeFT-X, a novel composable SFT approach that denoises the weight matrices of a pretrained model before magnitude pruning using singular value decomposition, thus yielding more robust SFTs. We evaluate DeFT-X on a diverse set of extremely low-resource languages for sentiment classification (NusaX) and natural language inference (AmericasNLI) and demonstrate that it performs at par or outperforms SFT and other prominent cross-lingual transfer baselines.