Generative Early Stage Ranking

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In the early-stage ranking (ESR) of large-scale recommender systems, user-item representation disentanglement impairs fine-grained cross-feature modeling capability. Method: This paper proposes a generative ESR paradigm featuring full target-aware attentional sequence modeling—first realized at industrial scale—via a hybrid attention module (incorporating hard-matching attention, target-aware self-attention, and cross-attention), multi-logit gated fusion, and customized kernel optimization. Contribution/Results: The approach overcomes the performance ceiling of disentangled architectures while maintaining high throughput and low latency. It significantly improves ranking accuracy, user engagement, and content consumption metrics. Extensive offline evaluations and large-scale online A/B tests confirm its effectiveness; the method has been successfully deployed in a production-grade recommender system.

Technology Category

Application Category

📝 Abstract
Large-scale recommendations commonly adopt a multi-stage cascading ranking system paradigm to balance effectiveness and efficiency. Early Stage Ranking (ESR) systems utilize the "user-item decoupling" approach, where independently learned user and item representations are only combined at the final layer. While efficient, this design is limited in effectiveness, as it struggles to capture fine-grained user-item affinities and cross-signals. To address these, we propose the Generative Early Stage Ranking (GESR) paradigm, introducing the Mixture of Attention (MoA) module which leverages diverse attention mechanisms to bridge the effectiveness gap: the Hard Matching Attention (HMA) module encodes explicit cross-signals by computing raw match counts between user and item features; the Target-Aware Self Attention module generates target-aware user representations conditioned on the item, enabling more personalized learning; and the Cross Attention modules facilitate early and more enriched interactions between user-item features. MoA's specialized attention encodings are further refined in the final layer through a Multi-Logit Parameterized Gating (MLPG) module, which integrates the newly learned embeddings via gating and produces secondary logits that are fused with the primary logit. To address the efficiency and latency challenges, we have introduced a comprehensive suite of optimization techniques. These span from custom kernels that maximize the capabilities of the latest hardware to efficient serving solutions powered by caching mechanisms. The proposed GESR paradigm has shown substantial improvements in topline metrics, engagement, and consumption tasks, as validated by both offline and online experiments. To the best of our knowledge, this marks the first successful deployment of full target-aware attention sequence modeling within an ESR stage at such a scale.
Problem

Research questions and friction points this paper is trying to address.

Early stage ranking systems struggle with capturing fine-grained user-item affinities
Existing designs lack effective cross-signal modeling between user and item features
Current approaches face efficiency challenges when implementing enriched interaction mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Attention modules bridge effectiveness gap
Multi-Logit Parameterized Gating refines specialized attention encodings
Custom kernels and caching enable efficient large-scale deployment
🔎 Similar Papers
No similar papers found.
J
Juhee Hong
Meta Platforms, Inc., Menlo Park, CA, USA
M
Meng Liu
Meta Platforms, Inc., Menlo Park, CA, USA
S
Shengzhi Wang
Meta Platforms, Inc., Menlo Park, CA, USA
X
Xiaoheng Mao
Meta Platforms, Inc., Menlo Park, CA, USA
H
Huihui Cheng
Meta Platforms, Inc., Menlo Park, CA, USA
L
Leon Gao
Meta Platforms, Inc., Menlo Park, CA, USA
C
Christopher Leung
Meta Platforms, Inc., Menlo Park, CA, USA
J
Jin Zhou
Meta Platforms, Inc., Menlo Park, CA, USA
C
Chandra Mouli Sekar
Meta Platforms, Inc., Menlo Park, CA, USA
Z
Zhao Zhu
Meta Platforms, Inc., Menlo Park, CA, USA
Ruochen Liu
Ruochen Liu
Assistant Professor of Beihang University
Additive manufacturingSelf-sustaining reactionPolymer compositesNeuromorphic computing
Tuan Trieu
Tuan Trieu
Meta Platforms, Inc., Menlo Park, CA, USA
Dawei Sun
Dawei Sun
Meta Platforms, Inc., Menlo Park, CA, USA
J
Jeet Kanjani
Meta Platforms, Inc., Menlo Park, CA, USA
R
Rui Li
Meta Platforms, Inc., Menlo Park, CA, USA
J
Jing Qian
Meta Platforms, Inc., Menlo Park, CA, USA
X
Xuan Cao
Meta Platforms, Inc., Menlo Park, CA, USA
Minjie Fan
Minjie Fan
Meta Platforms, Inc., Menlo Park, CA, USA
Mingze Gao
Mingze Gao
Macquarie University
Corporate FinanceBanking