🤖 AI Summary
This paper addresses performance degradation in dual-tower learning-to-rank (LTR) models caused by user feedback bias. It identifies two interrelated mechanisms: logging policy confounding and insufficient model identifiability—where document position swapping and feature distribution overlap are necessary conditions for unbiased parameter estimation. Theoretical analysis shows that dual-tower models are robust to logging policies only under perfect user behavior modeling; in practice, imperfect modeling amplifies bias. To mitigate this, the paper proposes a counterfactual inference-based sample weighting method that explicitly disentangles position bias from document relevance, thereby reducing prediction error. Extensive experiments on multiple industrial datasets demonstrate consistent improvements in NDCG by 3.2%–5.7%. The approach provides a practical, deployable solution for unbiased learning in large-scale ranking systems.
📝 Abstract
Additive two-tower models are popular learning-to-rank methods for handling biased user feedback in industry settings. Recent studies, however, report a concerning phenomenon: training two-tower models on clicks collected by well-performing production systems leads to decreased ranking performance. This paper investigates two recent explanations for this observation: confounding effects from logging policies and model identifiability issues. We theoretically analyze the identifiability conditions of two-tower models, showing that either document swaps across positions or overlapping feature distributions are required to recover model parameters from clicks. We also investigate the effect of logging policies on two-tower models, finding that they introduce no bias when models perfectly capture user behavior. However, logging policies can amplify biases when models imperfectly capture user behavior, particularly when prediction errors correlate with document placement across positions. We propose a sample weighting technique to mitigate these effects and provide actionable insights for researchers and practitioners using two-tower models.