Multi-Value-Product Retrieval-Augmented Generation for Industrial Product Attribute Value Identification

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Product Attribute Value Identification (PAVI) is critical for enhancing e-commerce search, recommendation, and business analytics, yet existing approaches suffer from cascading errors, poor out-of-distribution (OOD) value detection, and limited generalization. To address these challenges, we propose a multi-paradigm PAVI framework integrating retrieval, generation, and classification. Specifically, we design a hierarchical semantic retrieval mechanism that models products and attribute values as layered units; develop a category-constrained retrieval-augmented generation (RAG) pipeline—comprising similar-product retrieval, candidate-value extraction, and large language model–based generation; and introduce a ranking-generation synergy strategy to mitigate error propagation and improve OOD robustness. Evaluated in real-world industrial deployments, our method significantly outperforms state-of-the-art baselines, achieving superior accuracy, strong generalization across diverse categories and unseen values, and high practical utility.

Technology Category

Application Category

📝 Abstract
Identifying attribute values from product profiles is a key task for improving product search, recommendation, and business analytics on e-commerce platforms, which we called Product Attribute Value Identification (PAVI) . However, existing PAVI methods face critical challenges, such as cascading errors, inability to handle out-of-distribution (OOD) attribute values, and lack of generalization capability. To address these limitations, we introduce Multi-Value-Product Retrieval-Augmented Generation (MVP-RAG), combining the strengths of retrieval, generation, and classification paradigms. MVP-RAG defines PAVI as a retrieval-generation task, where the product title description serves as the query, and products and attribute values act as the corpus. It first retrieves similar products of the same category and candidate attribute values, and then generates the standardized attribute values. The key advantages of this work are: (1) the proposal of a multi-level retrieval scheme, with products and attribute values as distinct hierarchical levels in PAVI domain (2) attribute value generation of large language model to significantly alleviate the OOD problem and (3) its successful deployment in a real-world industrial environment. Extensive experimental results demonstrate that MVP-RAG performs better than the state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Identifying attribute values from e-commerce product profiles
Addressing cascading errors and out-of-distribution value challenges
Improving generalization in product attribute value identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-level retrieval scheme for products and attribute values
Large language model generates standardized attribute values
Alleviates out-of-distribution problem through generation approach
🔎 Similar Papers
No similar papers found.
H
Huike Zou
Xianyu of Alibaba
H
Haiyang Yang
Xianyu of Alibaba
Yindu Su
Yindu Su
Xiaohongshu Inc.
L
Liyu Chen
Xianyu of Alibaba
C
Chengbao Lian
Xianyu of Alibaba
Q
Qingheng Zhang
Xianyu of Alibaba
Shuguang Han
Shuguang Han
Google AI
information retrieval
J
Jufeng Chen
Xianyu of Alibaba