Naver Labs Europe @ WSDM CUP | Multilingual Retrieval

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the challenge of cross-lingual retrieval for English queries over multilingual documents by proposing an efficient approach based on SPLARE-7B, a self-developed sparse retrieval model, combined with Qwen3-Reranker-4B—a lightweight reranker—and a simple score fusion strategy. The method overcomes the performance limitations of conventional dense models in non-English settings and demonstrates the competitiveness of learned sparse retrieval in multilingual environments. Evaluated in the WSDM Cup 2026 benchmark, the system substantially outperforms strong dense baselines such as Qwen3-8B-Embed, highlighting the advantages and potential of sparse models for cross-lingual retrieval tasks.

Technology Category

Application Category

📝 Abstract

This report presents our participation to the WSDM Cup 2026 shared task on multilingual document retrieval from English queries. The task provides a challenging benchmark for cross-lingual generalization. It also provides a natural testbed for evaluating SPLARE, our recently proposed learned sparse retrieval model, which produces generalizable sparse latent representations and is particularly well suited to multilingual retrieval settings. We evaluate five progressively enhanced runs, starting from a SPLARE-7B model and incorporating lightweight improvements, including reranking with Qwen3-Reranker-4B and simple score fusion strategies. Our results demonstrate the strength of SPLARE compared to state-of-the-art dense baselines such as Qwen3-8B-Embed. More broadly, our submission highlights the continued relevance and competitiveness of learned sparse retrieval models beyond English-centric scenarios.

Problem

Research questions and friction points this paper is trying to address.

multilingual retrieval

cross-lingual generalization

document retrieval

English queries

Innovation

Methods, ideas, or system contributions that make the work stand out.

learned sparse retrieval

multilingual retrieval

cross-lingual generalization

SPLARE