Generative Early Stage Ranking

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

In the early-stage ranking (ESR) of large-scale recommender systems, user-item representation disentanglement impairs fine-grained cross-feature modeling capability. Method: This paper proposes a generative ESR paradigm featuring full target-aware attentional sequence modeling—first realized at industrial scale—via a hybrid attention module (incorporating hard-matching attention, target-aware self-attention, and cross-attention), multi-logit gated fusion, and customized kernel optimization. Contribution/Results: The approach overcomes the performance ceiling of disentangled architectures while maintaining high throughput and low latency. It significantly improves ranking accuracy, user engagement, and content consumption metrics. Extensive offline evaluations and large-scale online A/B tests confirm its effectiveness; the method has been successfully deployed in a production-grade recommender system.

Technology Category

Application Category

📝 Abstract

Large-scale recommendations commonly adopt a multi-stage cascading ranking system paradigm to balance effectiveness and efficiency. Early Stage Ranking (ESR) systems utilize the "user-item decoupling" approach, where independently learned user and item representations are only combined at the final layer. While efficient, this design is limited in effectiveness, as it struggles to capture fine-grained user-item affinities and cross-signals. To address these, we propose the Generative Early Stage Ranking (GESR) paradigm, introducing the Mixture of Attention (MoA) module which leverages diverse attention mechanisms to bridge the effectiveness gap: the Hard Matching Attention (HMA) module encodes explicit cross-signals by computing raw match counts between user and item features; the Target-Aware Self Attention module generates target-aware user representations conditioned on the item, enabling more personalized learning; and the Cross Attention modules facilitate early and more enriched interactions between user-item features. MoA's specialized attention encodings are further refined in the final layer through a Multi-Logit Parameterized Gating (MLPG) module, which integrates the newly learned embeddings via gating and produces secondary logits that are fused with the primary logit. To address the efficiency and latency challenges, we have introduced a comprehensive suite of optimization techniques. These span from custom kernels that maximize the capabilities of the latest hardware to efficient serving solutions powered by caching mechanisms. The proposed GESR paradigm has shown substantial improvements in topline metrics, engagement, and consumption tasks, as validated by both offline and online experiments. To the best of our knowledge, this marks the first successful deployment of full target-aware attention sequence modeling within an ESR stage at such a scale.

Problem

Research questions and friction points this paper is trying to address.

Early stage ranking systems struggle with capturing fine-grained user-item affinities

Existing designs lack effective cross-signal modeling between user and item features

Current approaches face efficiency challenges when implementing enriched interaction mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Attention modules bridge effectiveness gap

Multi-Logit Parameterized Gating refines specialized attention encodings

Custom kernels and caching enable efficient large-scale deployment

🔎 Similar Papers

Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers