๐ค AI Summary
This work addresses the inherent rank unidentifiability and solution non-uniqueness of Non-negative Matrix Factorization (NMF). We establish a theoretical connection between NMF and the Common Cause Principle (CCP) from causal inference, leveraging CCP to guide model design. Specifically, we propose a CCP-based effective rank estimation method that enforces joint-probability positive correlation constraints, thereby improving NMFโs robustness to noise. Furthermore, we formulate NMF as a probabilistic approximate realization of CCP, enhancing the causal interpretability of latent factors. Empirical evaluation on image data demonstrates that the method stably extracts reproducible basis images, effectively separates noise from underlying structure, and significantly improves clustering consistency and feature stability. The core contribution is the first bidirectional theoretical bridge linking NMF and CCPโuniquely integrating dimensionality reduction performance with causal interpretability.
๐ Abstract
Nonnegative matrix factorization (NMF) is a known unsupervised data-reduction method. The principle of the common cause (PCC) is a basic methodological approach in probabilistic causality, which seeks an independent mixture model for the joint probability of two dependent random variables. It turns out that these two concepts are closely related. This relationship is explored reciprocally for several datasets of gray-scale images, which are conveniently mapped into probability models. On one hand, PCC provides a predictability tool that leads to a robust estimation of the effective rank of NMF. Unlike other estimates (e.g., those based on the Bayesian Information Criteria), our estimate of the rank is stable against weak noise. We show that NMF implemented around this rank produces features (basis images) that are also stable against noise and against seeds of local optimization, thereby effectively resolving the NMF nonidentifiability problem. On the other hand, NMF provides an interesting possibility of implementing PCC in an approximate way, where larger and positively correlated joint probabilities tend to be explained better via the independent mixture model. We work out a clustering method, where data points with the same common cause are grouped into the same cluster. We also show how NMF can be employed for data denoising.