SaFRO: Satisfaction-Aware Fusion via Dual-Relative Policy Optimization for Short-Video Search

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing short-video search systems, whose multi-task fusion approaches predominantly optimize immediate interaction metrics at the expense of long-term user satisfaction. To bridge this gap, we propose a query-level satisfaction-aware reward modeling framework that integrates a Dual Relative Policy Optimization (DRPO) mechanism with a task relationship-aware fusion module, enabling context-sensitive adaptive weighting of multiple objectives. The proposed method maintains strong short-term ranking performance while significantly improving long-term user retention. Extensive offline evaluations and online A/B tests on the Kuaishou platform demonstrate that our approach consistently outperforms current baselines across key metrics, confirming its effectiveness and practical applicability in real-world large-scale recommendation scenarios.

Technology Category

Application Category

📝 Abstract
Multi-Task Fusion plays a pivotal role in industrial short-video search systems by aggregating heterogeneous prediction signals into a unified ranking score. However, existing approaches predominantly optimize for immediate engagement metrics, which often fail to align with long-term user satisfaction. While Reinforcement Learning (RL) offers a promising avenue for user satisfaction optimization, its direct application to search scenarios is non-trivial due to the inherent data sparsity and intent constraints compared to recommendation feeds. To this end, we propose SaFRO, a novel framework designed to optimize user satisfaction in short-video search. We first construct a satisfaction-aware reward model that utilizes query-level behavioral proxies to capture holistic user satisfaction beyond item-level interactions. Then we introduce Dual-Relative Policy Optimization (DRPO), an efficient policy learning method that updates the fusion policy through relative preference comparisons within groups and across batches. Furthermore, we design a Task-Relation-Aware Fusion module to explicitly model the interdependencies among different objectives, enabling context-sensitive weight adaptation. Extensive offline evaluations and large-scale online A/B tests on Kuaishou short-video search platform demonstrate that SaFRO significantly outperforms state-of-the-art baselines, delivering substantial gains in both short-term ranking quality and long-term user retention.
Problem

Research questions and friction points this paper is trying to address.

short-video search
user satisfaction
multi-task fusion
ranking optimization
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Satisfaction-Aware Reward
Dual-Relative Policy Optimization
Task-Relation-Aware Fusion
Short-Video Search
Multi-Task Fusion
🔎 Similar Papers
R
Renzhe Zhou
Kuaishou Technology
S
Songyang Li
Kuaishou Technology
F
Feiran Zhu
Kuaishou Technology
C
Chenglei Dai
Kuaishou Technology
Yi Zhang
Yi Zhang
Huawei Co., Ltd
CVAITrustworthy AI
Y
Yi Wang
Kuaishou Technology
Jingwei Zhuo
Jingwei Zhuo
JD Inc
Machine Learning