Extracting Important Tokens in E-Commerce Queries with a Tag Interaction-Aware Transformer Model

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inaccurate identification of key query tokens and insufficient intent understanding in e-commerce search, this paper proposes TagBERT—a novel model that formulates query rewriting as a token-level classification task. TagBERT introduces, for the first time within the Transformer architecture, an explicit semantic label interaction mechanism, enabling joint modeling of query tokens and their fine-grained semantic labels (e.g., brand, category, attribute). Leveraging dependency-aware contextual encoding and label-aware attention, the model learns more discriminative token representations. Extensive experiments on a large-scale real-world e-commerce dataset demonstrate that TagBERT significantly outperforms strong baselines—including BERT, eBERT, and Seq2Seq Transformer—on key token identification, achieving absolute F1 improvements of 3.2–5.8 percentage points. These gains effectively enhance downstream query reformulation and product matching performance.

Technology Category

Application Category

📝 Abstract
The major task of any e-commerce search engine is to retrieve the most relevant inventory items, which best match the user intent reflected in a query. This task is non-trivial due to many reasons, including ambiguous queries, misaligned vocabulary between buyers, and sellers, over- or under-constrained queries by the presence of too many or too few tokens. To address these challenges, query reformulation is used, which modifies a user query through token dropping, replacement or expansion, with the objective to bridge semantic gap between query tokens and users' search intent. Early methods of query reformulation mostly used statistical measures derived from token co-occurrence frequencies from selective user sessions having clicks or purchases. In recent years, supervised deep learning approaches, specifically transformer-based neural language models, or sequence-to-sequence models are being used for query reformulation task. However, these models do not utilize the semantic tags of a query token, which are significant for capturing user intent of an e-commerce query. In this work, we pose query reformulation as a token classification task, and solve this task by designing a dependency-aware transformer-based language model, TagBERT, which makes use of semantic tags of a token for learning superior query phrase embedding. Experiments on large, real-life e-commerce datasets show that TagBERT exhibits superior performance than plethora of competing models, including BERT, eBERT, and Sequence-to-Sequence transformer model for important token classification task.
Problem

Research questions and friction points this paper is trying to address.

Improving e-commerce query reformulation using semantic tags
Bridging vocabulary gaps between buyers and sellers
Enhancing token classification with transformer-based TagBERT model
Innovation

Methods, ideas, or system contributions that make the work stand out.

TagBERT model uses semantic token tags
Transformer-based token classification for queries
Dependency-aware embedding improves query reformulation
🔎 Similar Papers
No similar papers found.