MIMO: Multilingual Information Retrieval via Monolingual Objectives

๐Ÿ“… 2026-05-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

174K/year
๐Ÿค– AI Summary
Existing embedding models exhibit limited performance in multilingual mixed retrieval, and conventional contrastive learning often induces language clustering, struggling to balance cross-lingual alignment with embedding uniformity. To address this, this work proposes MIMO, a two-stage training framework: first, a high-performing English teacher model initializes the student via knowledge distillation to establish stable cross-lingual semantic alignment; second, distillation loss is jointly optimized with cross-lingual contrastive learning to enhance retrieval discriminability. This approach innovatively integrates knowledge distillation and contrastive learning, improving distinctiveness while preserving alignment, and reveals an inherent trade-off between alignment and uniformity. Experiments demonstrate that MIMO consistently outperforms existing cross-lingual training methods across multiple monolingual and multilingual retrieval benchmarks, achieving performance on par with or superior to off-the-shelf models of comparable or even larger scale.
๐Ÿ“ Abstract
Multilingual Information Retrieval (MLIR) reflects real-world search environments in which queries and relevant documents may appear in different languages within a mixed-language corpus. However, existing embedding models are primarily optimized for Multi-Monolingual retrieval and their performance often degrades in MLIR settings. Moreover, directly applying conventional contrastive learning to MLIR can exacerbate language clustering and expose a trade-off between cross-lingual alignment and embedding uniformity. To address these limitations, we propose MIMO: Multilingual Information Retrieval via Monolingual Objectives, a two-stage framework that uses a stable English semantic space from a high-performing teacher model as an anchor. MIMO first initializes the student model's cross-lingual alignment through knowledge distillation, and then jointly optimizes distillation and cross-lingual contrastive learning to improve retrieval discrimination while preserving alignment. Extensive experiments show that MIMO consistently outperforms existing cross-lingual training baselines across various MLIR and Multi-Monolingual benchmarks. MIMO also remains competitive with off-the-shelf models of similar or larger parameter scales. Furthermore, our cross-lingual Alignment-Uniformity analysis clarifies the distinct roles of the two loss components and shows that their combination yields a favorable trade-off between alignment and uniformity.
Problem

Research questions and friction points this paper is trying to address.

Multilingual Information Retrieval
cross-lingual alignment
embedding uniformity
language clustering
contrastive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual information retrieval
knowledge distillation
cross-lingual alignment
contrastive learning
alignment-uniformity trade-off
๐Ÿ”Ž Similar Papers
No similar papers found.