Exploring Pretraining via Active Forgetting for Improving Cross Lingual Transfer for Decoder Language Models

📅 2024-10-21
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Weak cross-lingual transfer—particularly to low-resource languages—remains a key limitation of decoder-only large language models (LLMs). To address this, we propose a novel pretraining strategy grounded in an active forgetting mechanism. This work is the first to introduce active forgetting regularization into decoder-only multilingual pretraining, integrating multilingual mixed-data training with representation learning analysis. The approach significantly enhances zero-shot cross-lingual generalization to unseen languages. Experiments demonstrate consistent and substantial improvements over same-scale baselines across multilingual downstream tasks—including XNLI and XQuAD—with especially pronounced gains for low-resource languages. Remarkably, the resulting decoder-only models achieve cross-lingual transfer performance on par with strong encoder-based multilingual models (e.g., XLM-RoBERTa). Our method establishes a new paradigm for multilingual modeling with decoder-only architectures, advancing their viability for truly inclusive, resource-agnostic language understanding.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) demonstrate exceptional capabilities in a multitude of NLP tasks. However, the efficacy of such models to languages other than English is often limited. Prior works have shown that encoder-only models such as BERT or XLM-RoBERTa show impressive cross lingual transfer of their capabilities from English to other languages. In this work, we propose a pretraining strategy that uses active forgetting to achieve similar cross lingual transfer in decoder-only LLMs. We show that LLMs pretrained with active forgetting are highly effective when adapting to new and unseen languages. Through extensive experimentation, we find that LLMs pretrained with active forgetting are able to learn better multilingual representations which translates to better performance in many downstream tasks.
Problem

Research questions and friction points this paper is trying to address.

Improving cross lingual transfer for decoder-only LLMs
Enhancing multilingual representation learning in LLMs
Boosting performance in downstream NLP tasks for non-English languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active forgetting enhances cross lingual transfer
Pretraining strategy for decoder-only LLMs
Better multilingual representations improve downstream tasks
🔎 Similar Papers
No similar papers found.