🤖 AI Summary
Recommendation systems suffer from distorted user preference representations due to selection bias, undermining both accuracy and fairness. Existing causal debiasing methods commonly assume mutual independence among latent exogenous variables—a strong assumption frequently violated in practice. This work is the first to relax this independence assumption in recommendation debiasing, proposing a unified likelihood-maximization framework that explicitly models the joint influence of unobserved confounders and correlated latent exogenous variables. Our approach integrates structural causal modeling, latent variable modeling, and Monte Carlo likelihood estimation, with the data-generating process modeled under Gaussian assumptions. Extensive experiments on synthetic data and three real-world benchmarks demonstrate that our method significantly outperforms mainstream debiasing baselines—including IPW and DR—while simultaneously improving both recommendation accuracy and fairness. The implementation is publicly available.
📝 Abstract
Recommendation systems (RS) aim to provide personalized content, but they face a challenge in unbiased learning due to selection bias, where users only interact with items they prefer. This bias leads to a distorted representation of user preferences, which hinders the accuracy and fairness of recommendations. To address the issue, various methods such as error imputation based, inverse propensity scoring, and doubly robust techniques have been developed. Despite the progress, from the structural causal model perspective, previous debiasing methods in RS assume the independence of the exogenous variables. In this paper, we release this assumption and propose a learning algorithm based on likelihood maximization to learn a prediction model. We first discuss the correlation and difference between unmeasured confounding and our scenario, then we propose a unified method that effectively handles latent exogenous variables. Specifically, our method models the data generation process with latent exogenous variables under mild normality assumptions. We then develop a Monte Carlo algorithm to numerically estimate the likelihood function. Extensive experiments on synthetic datasets and three real-world datasets demonstrate the effectiveness of our proposed method. The code is at https://github.com/WallaceSUI/kdd25-background-variable.