Meta Off-Policy Estimation

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem that conventional ensemble methods for off-policy evaluation (OPE) fail when multiple OPE estimators—computed from shared observational data—exhibit statistical dependence, violating the independence assumption. To resolve this, we propose a novel meta-analytic fusion framework based on fixed-effects modeling, which explicitly characterizes the covariance structure among OPE estimators and their confidence intervals. Our approach constructs the Best Linear Unbiased Estimator (BLUE) under this dependency model and derives conservative confidence intervals. Unlike prior work, this is the first systematic incorporation of inter-estimator correlation into OPE, relaxing the restrictive independence assumption. Experiments on synthetic and real-world recommendation system datasets demonstrate that our method significantly reduces estimation variance (average reduction of 32%) and consistently outperforms all individual OPE estimators—while preserving unbiasedness—thereby enhancing both statistical efficiency and reliability of policy evaluation.

Technology Category

Application Category

📝 Abstract
Off-policy estimation (OPE) methods enable unbiased offline evaluation of recommender systems, directly estimating the online reward some target policy would have obtained, from offline data and with statistical guarantees. The theoretical elegance of the framework combined with practical successes have led to a surge of interest, with many competing estimators now available to practitioners and researchers. Among these, Doubly Robust methods provide a prominent strategy to combine value- and policy-based estimators. In this work, we take an alternative perspective to combine a set of OPE estimators and their associated confidence intervals into a single, more accurate estimate. Our approach leverages a correlated fixed-effects meta-analysis framework, explicitly accounting for dependencies among estimators that arise due to shared data. This yields a best linear unbiased estimate (BLUE) of the target policy's value, along with an appropriately conservative confidence interval that reflects inter-estimator correlation. We validate our method on both simulated and real-world data, demonstrating improved statistical efficiency over existing individual estimators.
Problem

Research questions and friction points this paper is trying to address.

Combine multiple OPE estimators for accurate offline evaluation
Account for dependencies among estimators due to shared data
Improve statistical efficiency over existing individual estimators
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines multiple OPE estimators using meta-analysis
Accounts for dependencies among shared data estimators
Provides best linear unbiased estimate with confidence intervals
🔎 Similar Papers
No similar papers found.