๐ค AI Summary
Existing embedding models exhibit limited performance in multilingual mixed retrieval, and conventional contrastive learning often induces language clustering, struggling to balance cross-lingual alignment with embedding uniformity. To address this, this work proposes MIMO, a two-stage training framework: first, a high-performing English teacher model initializes the student via knowledge distillation to establish stable cross-lingual semantic alignment; second, distillation loss is jointly optimized with cross-lingual contrastive learning to enhance retrieval discriminability. This approach innovatively integrates knowledge distillation and contrastive learning, improving distinctiveness while preserving alignment, and reveals an inherent trade-off between alignment and uniformity. Experiments demonstrate that MIMO consistently outperforms existing cross-lingual training methods across multiple monolingual and multilingual retrieval benchmarks, achieving performance on par with or superior to off-the-shelf models of comparable or even larger scale.
๐ Abstract
Multilingual Information Retrieval (MLIR) reflects real-world search environments in which queries and relevant documents may appear in different languages within a mixed-language corpus. However, existing embedding models are primarily optimized for Multi-Monolingual retrieval and their performance often degrades in MLIR settings. Moreover, directly applying conventional contrastive learning to MLIR can exacerbate language clustering and expose a trade-off between cross-lingual alignment and embedding uniformity. To address these limitations, we propose MIMO: Multilingual Information Retrieval via Monolingual Objectives, a two-stage framework that uses a stable English semantic space from a high-performing teacher model as an anchor. MIMO first initializes the student model's cross-lingual alignment through knowledge distillation, and then jointly optimizes distillation and cross-lingual contrastive learning to improve retrieval discrimination while preserving alignment. Extensive experiments show that MIMO consistently outperforms existing cross-lingual training baselines across various MLIR and Multi-Monolingual benchmarks. MIMO also remains competitive with off-the-shelf models of similar or larger parameter scales. Furthermore, our cross-lingual Alignment-Uniformity analysis clarifies the distinct roles of the two loss components and shows that their combination yields a favorable trade-off between alignment and uniformity.