DUET -- Dual User Embedding Transformers for Offsite Conversion Prediction

📅 2026-06-08
📈 Citations: 0
âœĻ Influential: 0
📄 PDF
ðŸĪ– AI Summary
This work addresses the challenge of predicting offsite conversion rate (OCVR), where conversion signals are sparse, delayed, and difficult to attribute, in contrast to dense click signals. To this end, the authors propose a heterogeneous dual-stream pretraining architecture that employs dedicated Transformer encoders tailored to the distinct statistical characteristics of click and conversion sequences. Specifically, the click stream utilizes multi-layer self-attention, while the conversion stream alternates between cross-attention and self-attention. The embeddings from both streams are then fused for downstream ranking models. This approach enables, for the first time, accurate joint modeling of both signal types under strict online latency constraints. Experimental results demonstrate a maximum 0.38% reduction in offline normalized entropy (NE), and A/B tests confirm a significant improvement in OCVR prediction accuracy.
📝 Abstract
Offsite conversion rate (OCVR) prediction is an important ranking problem in computational recommendation systems. This task presents a modeling challenge: click signals are abundant and exhibit short temporal horizons, whereas conversion signals are inherently sparse, long-delayed, and frequently unattributed. Despite these statistical disparities, both signal types must inform models that operate within strict serving-latency constraints. Prior pre-training approaches address this heterogeneity with a single, undifferentiated encoder applied uniformly across both data streams. We propose DUET (Dual User Embedding Transformers), a framework that explicitly partitions user behavioral data into two domain-coherent streams -- clicks and conversions -- and pre-trains dedicated transformer encoders with architectures tailored to each stream's statistical characteristics: multi-layer self-attention for the dense click stream and interleaved cross- and self-attention for the sparse conversion stream. The resulting complementary embeddings are jointly consumed by a downstream ranker without exceeding serving-latency budgets. Evaluation demonstrates up to 0.38% normalized entropy (NE) reduction relative to the strongest baseline, and A/B test shows consistent improvements in OCVR prediction accuracy.
Problem

Research questions and friction points this paper is trying to address.

offsite conversion rate prediction
click signals
conversion signals
serving latency
recommendation systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-stream modeling
Transformer architecture
Offsite conversion prediction
Pre-training
Sparse signal modeling
🔎 Similar Papers
R
Reazul Hasan Russel
AI at Meta
M
Mingwei Tang
AI at Meta
R
Rostam Shirani
AI at Meta
X
Xinlong Liu
Navid Madani
Navid Madani
PhD Student
Natural Language ProcessingDialogue SystemsLarge Language ModelsComputational Social Science
L
Leo Ding
Y
Yawen He
Xiangyu Wang
Xiangyu Wang
Professor, Curtin University
Civil EngineeringBuilding Information ModelingSmart CityAutomation and RoboticsSmart
Mustafa Acar
Mustafa Acar
Michigan State University
A
Ashish Katiyar
Y
Yuhai Li
A
Alan Yang
M
Metarya Ruparel
D
Derek Qiang Xu
R
Rupert Wu
R
Rui Yang
Liang Tao
Liang Tao
č…ūčŪŊį§‘æŠ€æœ‰é™å…Žåļ
Deep Learning Multimodal NLP
Xinyi Zhao
Xinyi Zhao
Columbia university
Data ScienceData Visualization
L
Larry Zhang
S
Sri Reddy
R
Rob Malkin