Semantic-enhanced Modality-asymmetric Retrieval for Online E-commerce Search

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the modality-asymmetric retrieval problem in e-commerce search, where queries are purely textual while items are multimodal (text-image). To tackle cross-modal representation fusion and semantic alignment, we propose SMAR—a Semantic-enhanced Multimodal Alignment and Retrieval model. SMAR introduces a novel cross-modal alignment mechanism: it models fine-grained text-image associations via deep semantic matching, employs an attention-driven modality interaction module for dynamic feature alignment, and jointly optimizes the text and image encoders in an end-to-end manner. Evaluated on a large-scale industrial e-commerce dataset, SMAR significantly outperforms state-of-the-art unimodal and multimodal baselines, achieving an average +8.2% improvement in Recall@10. Furthermore, we release the first large-scale, publicly available triplet dataset—comprising e-commerce queries, product images, and corresponding textual descriptions—specifically designed for asymmetric multimodal retrieval, thereby enabling reproducible research in this emerging direction.

Technology Category

Application Category

📝 Abstract
Semantic retrieval, which retrieves semantically matched items given a textual query, has been an essential component to enhance system effectiveness in e-commerce search. In this paper, we study the multimodal retrieval problem, where the visual information (e.g, image) of item is leveraged as supplementary of textual information to enrich item representation and further improve retrieval performance. Though learning from cross-modality data has been studied extensively in tasks such as visual question answering or media summarization, multimodal retrieval remains a non-trivial and unsolved problem especially in the asymmetric scenario where the query is unimodal while the item is multimodal. In this paper, we propose a novel model named SMAR, which stands for Semantic-enhanced Modality-Asymmetric Retrieval, to tackle the problem of modality fusion and alignment in this kind of asymmetric scenario. Extensive experimental results on an industrial dataset show that the proposed model outperforms baseline models significantly in retrieval accuracy. We have open sourced our industrial dataset for the sake of reproducibility and future research works.
Problem

Research questions and friction points this paper is trying to address.

Enhancing e-commerce search with semantic retrieval
Addressing multimodal retrieval with visual and textual data
Solving modality fusion in asymmetric query-item scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-enhanced multimodal retrieval for e-commerce
Modality-asymmetric fusion and alignment technique
Open-sourced industrial dataset for reproducibility
🔎 Similar Papers
No similar papers found.
Z
Zhigong Zhou
JD.com, Beijing, China
N
Ning Ding
JD.com, Beijing, China
Xiaochuan Fan
Xiaochuan Fan
Tiktok
Machine LearningSearch & RecommendationComputer Vision
Yue Shang
Yue Shang
Drexel University
NLU/NLGMLinformation retrievalrelevancesemantic match
Y
Yiming Qiu
JD.com, Beijing, China
Jingwei Zhuo
Jingwei Zhuo
JD Inc
Machine Learning
Z
Zhiwei Ge
JD.com, Beijing, China
Songlin Wang
Songlin Wang
R&D Engineer, JD.com
Information RetrievalNatural Language Processing
L
Lin Liu
JD.com, Beijing, China
Sulong Xu
Sulong Xu
京东
H
Han Zhang
JD.com, Beijing, China