SaFRO: Satisfaction-Aware Fusion via Dual-Relative Policy Optimization for Short-Video Search

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the limitation of existing short-video search systems, whose multi-task fusion approaches predominantly optimize immediate interaction metrics at the expense of long-term user satisfaction. To bridge this gap, we propose a query-level satisfaction-aware reward modeling framework that integrates a Dual Relative Policy Optimization (DRPO) mechanism with a task relationship-aware fusion module, enabling context-sensitive adaptive weighting of multiple objectives. The proposed method maintains strong short-term ranking performance while significantly improving long-term user retention. Extensive offline evaluations and online A/B tests on the Kuaishou platform demonstrate that our approach consistently outperforms current baselines across key metrics, confirming its effectiveness and practical applicability in real-world large-scale recommendation scenarios.

Technology Category

Application Category

📝 Abstract

Multi-Task Fusion plays a pivotal role in industrial short-video search systems by aggregating heterogeneous prediction signals into a unified ranking score. However, existing approaches predominantly optimize for immediate engagement metrics, which often fail to align with long-term user satisfaction. While Reinforcement Learning (RL) offers a promising avenue for user satisfaction optimization, its direct application to search scenarios is non-trivial due to the inherent data sparsity and intent constraints compared to recommendation feeds. To this end, we propose SaFRO, a novel framework designed to optimize user satisfaction in short-video search. We first construct a satisfaction-aware reward model that utilizes query-level behavioral proxies to capture holistic user satisfaction beyond item-level interactions. Then we introduce Dual-Relative Policy Optimization (DRPO), an efficient policy learning method that updates the fusion policy through relative preference comparisons within groups and across batches. Furthermore, we design a Task-Relation-Aware Fusion module to explicitly model the interdependencies among different objectives, enabling context-sensitive weight adaptation. Extensive offline evaluations and large-scale online A/B tests on Kuaishou short-video search platform demonstrate that SaFRO significantly outperforms state-of-the-art baselines, delivering substantial gains in both short-term ranking quality and long-term user retention.

Problem

Research questions and friction points this paper is trying to address.

short-video search

user satisfaction

multi-task fusion

ranking optimization

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Satisfaction-Aware Reward

Dual-Relative Policy Optimization

Task-Relation-Aware Fusion