🤖 AI Summary
This paper addresses the limited flexibility and identifiability of imputation models under missing data. We propose the Markov Missingness Graph (MMG) framework, which encodes the conditional independence structure of the missingness pattern via an undirected graph to enable local decomposition of the imputation model. Integrated with the Principle of Available Information (PAI), imputation is formulated as an empirical risk minimization problem, permitting arbitrary predictive models. Theoretically, we establish— for the first time—the identifiability conditions under MMG, clarifying its equivalence to the Missing at Random (MAR) assumption and characterizing its relaxation boundaries. Methodologically, we introduce a scalable, graph-guided learning paradigm. Extensive simulations and application to real-world Alzheimer’s disease data demonstrate both statistical validity and computational efficiency.
📝 Abstract
We introduce the Markov missing graph (MMG), a novel framework that imputes missing data based on undirected graphs. MMG leverages conditional independence relationships to locally decompose the imputation model. To establish the identification, we introduce the Principle of Available Information (PAI), which guides the use of all relevant observed data. We then propose a flexible statistical learning paradigm, MMG Imputation Risk Minimization under PAI, that frames the imputation task as an empirical risk minimization problem. This framework is adaptable to various modeling choices. We develop theories of MMG, including the connection between MMG and Little's complete-case missing value assumption, recovery under missing completely at random, efficiency theory, and graph-related properties. We show the validity of our method with simulation studies and illustrate its application with a real-world Alzheimer's data set.