🤖 AI Summary
This work addresses cross-lingual semantic matching in multilingual e-commerce search. Methodologically, we propose a data-centric large language model (LLM) optimization framework featuring multilingual semantic alignment representation learning, synthetic query–product pair data augmentation, curriculum-based fine-tuning, and relevance-aware loss optimization—collectively enhancing LLMs’ fine-grained matching capability for low-resource languages. Our key contributions are: (1) deep integration of LLMs into the end-to-end retrieval pipeline—not merely as feature extractors—and (2) establishment of a reproducible, multilingual e-commerce relevance annotation paradigm. Evaluated on an international e-commerce search competition benchmark, our approach achieves state-of-the-art (SOTA) performance, ranking first on the official leaderboard. All code and annotated datasets are publicly released to foster reproducibility and community advancement.
📝 Abstract
This report details our methodology and results developed for the Multilingual E-commerce Search Competition. The problem aims to recognize relevance between user queries versus product items in a multilingual context and improve recommendation performance on e-commerce platforms. Utilizing Large Language Models (LLMs) and their capabilities in other tasks, our data-centric method achieved the highest score compared to other solutions during the competition. Final leaderboard is publised at https://alibaba-international-cikm2025.github.io. The source code for our project is published at https://github.com/nhtlongcs/e-commerce-product-search.