Unidentified and Confounded? Understanding Two-Tower Models for Unbiased Learning to Rank

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This paper addresses performance degradation in dual-tower learning-to-rank (LTR) models caused by user feedback bias. It identifies two interrelated mechanisms: logging policy confounding and insufficient model identifiability—where document position swapping and feature distribution overlap are necessary conditions for unbiased parameter estimation. Theoretical analysis shows that dual-tower models are robust to logging policies only under perfect user behavior modeling; in practice, imperfect modeling amplifies bias. To mitigate this, the paper proposes a counterfactual inference-based sample weighting method that explicitly disentangles position bias from document relevance, thereby reducing prediction error. Extensive experiments on multiple industrial datasets demonstrate consistent improvements in NDCG by 3.2%–5.7%. The approach provides a practical, deployable solution for unbiased learning in large-scale ranking systems.

Technology Category

Application Category

📝 Abstract

Additive two-tower models are popular learning-to-rank methods for handling biased user feedback in industry settings. Recent studies, however, report a concerning phenomenon: training two-tower models on clicks collected by well-performing production systems leads to decreased ranking performance. This paper investigates two recent explanations for this observation: confounding effects from logging policies and model identifiability issues. We theoretically analyze the identifiability conditions of two-tower models, showing that either document swaps across positions or overlapping feature distributions are required to recover model parameters from clicks. We also investigate the effect of logging policies on two-tower models, finding that they introduce no bias when models perfectly capture user behavior. However, logging policies can amplify biases when models imperfectly capture user behavior, particularly when prediction errors correlate with document placement across positions. We propose a sample weighting technique to mitigate these effects and provide actionable insights for researchers and practitioners using two-tower models.

Problem

Research questions and friction points this paper is trying to address.

Investigates identifiability issues in two-tower ranking models

Analyzes confounding effects of logging policies on model bias

Proposes sample weighting to mitigate bias amplification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes identifiability conditions for two-tower models

Investigates logging policy effects on model biases

Proposes sample weighting to mitigate bias amplification

🔎 Similar Papers

No similar papers found.