Zenith: Scaling up Ranking Models for Billion-scale Livestreaming Recommendation

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of inference latency caused by modeling high-dimensional sparse feature interactions in billion-scale live-streaming recommendation systems. To this end, we propose the Zenith architecture, which tokenizes high-dimensional features and introduces two key components—Token Fusion and Token Boost—to efficiently identify and prioritize a small set of critical features (termed Prime Tokens). By enhancing token heterogeneity, Zenith improves model performance while maintaining controlled inference costs. The approach significantly advances model scaling laws and demonstrates strong empirical gains: after deployment on TikTok Live, it achieves a 1.05% increase in CTR AUC, a 1.10% reduction in Logloss, and boosts both the number and duration of high-quality viewing sessions by 9.93% and 8.11%, respectively.

Technology Category

Application Category

📝 Abstract
Accurately capturing feature interactions is essential in recommender systems, and recent trends show that scaling up model capacity could be a key driver for next-level predictive performance. While prior work has explored various model architectures to capture multi-granularity feature interactions, relatively little attention has been paid to efficient feature handling and scaling model capacity without incurring excessive inference latency. In this paper, we address this by presenting Zenith, a scalable and efficient ranking architecture that learns complex feature interactions with minimal runtime overhead. Zenith is designed to handle a few high-dimensional Prime Tokens with Token Fusion and Token Boost modules, which exhibits superior scaling laws compared to other state-of-the-art ranking methods, thanks to its improved token heterogeneity. Its real-world effectiveness is demonstrated by deploying the architecture to TikTok Live, a leading online livestreaming platform that attracts billions of users globally. Our A/B test shows that Zenith achieves +1.05%/-1.10% in online CTR AUC and Logloss, and realizes +9.93% gains in Quality Watch Session / User and +8.11% in Quality Watch Duration / User.
Problem

Research questions and friction points this paper is trying to address.

feature interactions
model scaling
inference latency
ranking models
livestreaming recommendation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prime Tokens
Token Fusion
Token Boost
scaling laws
feature interaction
🔎 Similar Papers
No similar papers found.
R
Ruifeng Zhang
NC State University
Zexi Huang
Zexi Huang
University of California, Santa Barbara
Graph Machine LearningRepresentation LearningData Mining
Z
Zikai Wang
ByteDance
K
Ke Sun
ByteDance
B
Bohang Zheng
ByteDance
O
Ouyang Zhen
ByteDance
H
Huimin Xie
ByteDance
P
Phil Shen
ByteDance
J
Junlin Zhang
ByteDance
W
Wentao Guo
ByteDance
Q
Qinglei Wang
ByteDance