Exploiting Observation Bias to Improve Matrix Completion

📅 2023-06-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses matrix completion under observation bias—i.e., non-random missingness in observed entries. We propose Mask Nearest Neighbor (MNN), a two-stage algorithm: first, unsupervised recovery of latent factor distance structure from the binary observation mask; second, nonparametric regression leveraging this distance as a feature under sparse labeling. Our key contribution is the first formal modeling of observation bias and the target matrix as sharing the same latent factor structure; exploiting their intrinsic coupling enables a finite-sample error bound approaching that of supervised learning. The theoretical analysis integrates tools from matrix estimation, nonparametric statistics, and binary noise modeling. Empirically, on real-world datasets, MNN reduces mean squared error by up to 28× compared to conventional methods, demonstrating that observation bias can be systematically modeled and leveraged—not merely treated as nuisance noise.

📝 Abstract

We consider a variant of matrix completion where entries are revealed in a biased manner. We wish to understand the extent to which such bias can be exploited in improving predictions. Towards that, we propose a natural model where the observation pattern and outcome of interest are driven by the same set of underlying latent (or unobserved) factors. We devise Mask Nearest Neighbor (MNN), a novel two-stage matrix completion algorithm: first, it recovers (distances between) the latent factors by utilizing matrix estimation for the fully observed noisy binary matrix, corresponding to the observation pattern; second, it utilizes the recovered latent factors as features and sparsely observed noisy outcomes as labels to perform non-parametric supervised learning. Our analysis reveals that MNN enjoys entry-wise finite-sample error rates that are competitive with corresponding supervised learning parametric rates. Despite not having access to the latent factors and dealing with biased observations, MNN exhibits such competitive performance via only exploiting the shared information between the bias and outcomes. Finally, through empirical evaluation using a real-world dataset, we find that with MNN, the estimates have 28x smaller mean squared error compared to traditional matrix completion methods, suggesting the utility of the model and method proposed in this work.

Problem

Research questions and friction points this paper is trying to address.

Data Bias

Matrix Completion

Prediction Accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mask Nearest Neighbor (MNN)

Matrix Completion

Asymmetric Data Handling

🔎 Similar Papers

No similar papers found.

Authors to Follow