Dense Retrieval for Low Resource Languages -- the Case of Amharic Language

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses core challenges in dense retrieval for Amharic—a low-resource language with 120 million speakers—including scarcity of labeled data, pretraining resources, and word embeddings. We present the first systematic feasibility study, proposing a lightweight fine-tuning and cross-lingual transfer framework built upon mBERT and XLM-R. Our approach integrates contrastive learning, pseudo-labeling, and unsupervised domain adaptation to train dense encoders. Evaluated on a newly constructed Amharic QA retrieval benchmark—the first of its kind—we achieve a 37% improvement in Recall@10 over baseline methods, substantially outperforming traditional sparse retrieval and zero-shot cross-lingual baselines. Key contributions are: (1) the first publicly available Amharic dense retrieval benchmark; (2) empirical validation of lightweight adaptation and cross-lingual transfer efficacy in low-resource settings; and (3) a reproducible methodology for information retrieval in African languages.

Technology Category

Application Category

📝 Abstract

This paper reports some difficulties and some results when using dense retrievers on Amharic, one of the low-resource languages spoken by 120 millions populations. The efforts put and difficulties faced by University Addis Ababa toward Amharic Information Retrieval will be developed during the presentation.

Problem

Research questions and friction points this paper is trying to address.

Dense retrieval challenges in Amharic, a low-resource language

Addressing information retrieval difficulties for 120M Amharic speakers

University efforts to improve Amharic IR systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dense retrieval for Amharic language

Addressing low-resource language challenges

Collaboration with University Addis Ababa

🔎 Similar Papers

No similar papers found.

Authors to Follow