An Efficient Embedding Based Ad Retrieval with GPU-Powered Feature Interaction

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In large-scale ad retrieval, dual-tower models suffer from limited representation capability due to user-ad feature interaction being restricted solely to the inner-product layer. Method: This paper proposes a GPU-accelerated deep feature interaction retrieval framework. It introduces the Wide & Deep architecture—previously unused in industrial retrieval—for expressive joint modeling; designs a lightweight compressed inverted index structure to enable efficient and scalable deep cross-feature computation; and leverages GPU parallelization for low-latency, high-throughput end-to-end interactive retrieval. Contribution/Results: Deployed in Tencent’s advertising system, the framework achieves significant offline AUC gains and substantial online improvements in both QPS and CTR, demonstrating a balanced advancement in accuracy, efficiency, and engineering feasibility.

Technology Category

Application Category

📝 Abstract
In large-scale advertising recommendation systems, retrieval serves as a critical component, aiming to efficiently select a subset of candidate ads relevant to user behaviors from a massive ad inventory for subsequent ranking and recommendation. The Embedding-Based Retrieval (EBR) methods modeled by the dual-tower network are widely used in the industry to maintain both retrieval efficiency and accuracy. However, the dual-tower model has significant limitations: the embeddings of users and ads interact only at the final inner product computation, resulting in insufficient feature interaction capabilities. Although DNN-based models with both user and ad as input features, allowing for early-stage interaction between these features, are introduced in the ranking stage to mitigate this issue, they are computationally infeasible for the retrieval stage. To bridge this gap, this paper proposes an efficient GPU-based feature interaction for the dual-tower network to significantly improve retrieval accuracy while substantially reducing computational costs. Specifically, we introduce a novel compressed inverted list designed for GPU acceleration, enabling efficient feature interaction computation at scale. To the best of our knowledge, this is the first framework in the industry to successfully implement Wide and Deep in a retrieval system. We apply this model to the real-world business scenarios in Tencent Advertising, and experimental results demonstrate that our method outperforms existing approaches in offline evaluation and has been successfully deployed to Tencent's advertising recommendation system, delivering significant online performance gains. This improvement not only validates the effectiveness of the proposed method, but also provides new practical guidance for optimizing large-scale ad retrieval systems.
Problem

Research questions and friction points this paper is trying to address.

Enhancing feature interaction in dual-tower ad retrieval models
Reducing computational costs for early-stage feature interaction
Implementing GPU-accelerated retrieval to improve accuracy efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-based feature interaction in dual-tower network
Novel compressed inverted list for GPU acceleration
First framework implementing Wide and Deep in retrieval
Y
Yifan Lei
Tencent Inc.
J
Jiahua Luo
Tencent Inc.
T
Tingyu Jiang
Tencent Inc.
B
Bo Zhang
Tencent Inc.
Lifeng Wang
Lifeng Wang
Institute of Advanced Science Facilities, Shenzhen
High-order harmonic generationattosecond physics
D
Dapeng Liu
Tencent Inc.
Z
Zhaoren Wu
Tencent Inc.
H
Haijie Gu
Tencent Inc.
H
Huan Yu
Tencent Inc.
J
Jie Jiang
Tencent Inc.