Pantheon: Personalized Multi-objective Ensemble Sort via Iterative Pareto Policy Optimization

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses key challenges in live-stream recommendation—poor generalization of manually engineered ensemble ranking, difficulty in balancing multiple objectives, and delayed personalized response. We propose an end-to-end personalized multi-objective real-time ranking framework. Methodologically, we design a representation inheritance architecture that leverages fine-grained hidden states from the real-time ranking model as input; construct a joint training framework enabling cooperative multi-objective optimization; and introduce Iterative Pareto Policy Optimization (IPPO), the first algorithm to fully replace formulaic ensemble ranking in industrial-scale recommendation systems. Our contributions include establishing the first personalized joint-training paradigm tailored for live-stream scenarios and enabling dynamic approximation and online evolution of the multi-objective Pareto frontier. Deployed system-wide on Kuaishou’s live-stream platform, the solution serves 400 million users daily, significantly improving multi-objective synergy and personalization responsiveness.

Technology Category

Application Category

📝 Abstract

In this paper, we provide our milestone ensemble sort work and the first-hand practical experience, Pantheon, which transforms ensemble sorting from a"human-curated art"to a"machine-optimized science". Compared with formulation-based ensemble sort, our Pantheon has the following advantages: (1) Personalized Joint Training: our Pantheon is jointly trained with the real-time ranking model, which could capture ever-changing user personalized interests accurately. (2) Representation inheritance: instead of the highly compressed Pxtrs, our Pantheon utilizes the fine-grained hidden-states as model input, which could benefit from the Ranking model to enhance our model complexity. Meanwhile, to reach a balanced multi-objective ensemble sort, we further devise an extbf{iterative Pareto policy optimization} (IPPO) strategy to consider the multiple objectives at the same time. To our knowledge, this paper is the first work to replace the entire formulation-based ensemble sort in industry RecSys, which was fully deployed at Kuaishou live-streaming services, serving 400 Million users daily.

Problem

Research questions and friction points this paper is trying to address.

Transforming ensemble sorting from human-curated to machine-optimized

Personalized joint training for capturing user interests

Balancing multi-objective ensemble sort via iterative optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Personalized joint training with ranking model

Utilizes fine-grained hidden-states as input

Iterative Pareto policy optimization for multi-objectives

🔎 Similar Papers

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization