Quality Over Clicks: Intrinsic Quality-Driven Iterative Reinforcement Learning for Cold-Start E-Commerce Query Suggestion

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of poor query suggestion performance in e-commerce cold-start scenarios, where the absence of click-through data hinders conventional approaches. To overcome this limitation, the authors propose Cold-EQS, a novel framework that eliminates reliance on click-through rates by introducing a multidimensional intrinsic quality reward—comprising answerability, factuality, and information gain—for reinforcement learning. The framework further incorporates uncertainty estimation to identify hard examples and iteratively refines the model through reinforcement learning. Additionally, the study presents EQS-Benchmark, the first query suggestion benchmark dataset specifically designed for cold-start e-commerce settings. Experimental results demonstrate that Cold-EQS significantly improves chat unique visitors (chatUV) by 6.81%, with strong alignment between offline evaluation metrics and online performance.

Technology Category

Application Category

📝 Abstract
Existing dialogue systems rely on Query Suggestion (QS) to enhance user engagement. Recent efforts typically employ large language models with Click-Through Rate (CTR) model, yet fail in cold-start scenarios due to their heavy reliance on abundant online click data for effective CTR model training. To bridge this gap, we propose Cold-EQS, an iterative reinforcement learning framework for Cold-Start E-commerce Query Suggestion (EQS). Specifically, we leverage answerability, factuality, and information gain as reward to continuously optimize the quality of suggested queries. To continuously optimize our QS model, we estimate uncertainty for grouped candidate suggested queries to select hard and ambiguous samples from online user queries lacking click signals. In addition, we provide an EQS-Benchmark comprising 16,949 online user queries for offline training and evaluation. Extensive offline and online experiments consistently demonstrate a strong positive correlation between online and offline effectiveness. Both offline and online experimental results demonstrate the superiority of our Cold-EQS, achieving a significant +6.81% improvement in online chatUV.
Problem

Research questions and friction points this paper is trying to address.

Cold-Start
Query Suggestion
Click-Through Rate
E-commerce
Reinforcement Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cold-Start Query Suggestion
Intrinsic Quality Reward
Iterative Reinforcement Learning
Uncertainty-Based Sampling
E-commerce Dialogue System
🔎 Similar Papers
No similar papers found.
Q
Qi Sun
Alibaba International Digital Commercial Group
K
Kejun Xiao
Alibaba International Digital Commercial Group
Huaipeng Zhao
Huaipeng Zhao
Alibaba Inc
natural language processingMachine Learning
T
Tao Luo
Alibaba International Digital Commercial Group
X
Xiaoyi Zeng
Alibaba International Digital Commercial Group