Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Offline evaluation of retrieval–ranking systems suffers from distorted and non-generalizable estimates due to non-random missingness (MNAR) in user interaction data, inducing selection, exposure, positional, and conformity biases. This paper proposes the first unified framework integrating causal modeling and information theory. First, it constructs an explicit causal graph to characterize bias mechanisms and applies importance reweighting to transform MNAR data into approximately missing-at-random (MAR) data. Second, it designs a system-agnostic, model-transferable debiasing architecture. Third, it introduces a neural estimator of mutual information as a robust, black-box optimization objective for evaluation. Evaluated across multiple benchmarks, the method significantly improves consistency between offline evaluation metrics and online A/B test outcomes, achieving state-of-the-art accuracy and cross-domain generalizability.

Technology Category

Application Category

📝 Abstract
Evaluating retrieval-ranking systems is crucial for developing high-performing models. While online A/B testing is the gold standard, its high cost and risks to user experience require effective offline methods. However, relying on historical interaction data introduces biases-such as selection, exposure, conformity, and position biases-that distort evaluation metrics, driven by the Missing-Not-At-Random (MNAR) nature of user interactions and favoring popular or frequently exposed items over true user preferences. We propose a novel framework for robust offline evaluation of retrieval-ranking systems, transforming MNAR data into Missing-At-Random (MAR) through reweighting combined with black-box optimization, guided by neural estimation of information-theoretic metrics. Our contributions include (1) a causal formulation for addressing offline evaluation biases, (2) a system-agnostic debiasing framework, and (3) empirical validation of its effectiveness. This framework enables more accurate, fair, and generalizable evaluations, enhancing model assessment before deployment.
Problem

Research questions and friction points this paper is trying to address.

Address biases in offline ranking system evaluation
Transform MNAR data to MAR for accurate metrics
Develop system-agnostic debiasing framework for fairness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reweighting MNAR data into MAR
Black-box optimization for debiasing
Neural estimation of information-theoretic metrics
🔎 Similar Papers
No similar papers found.