Missing value imputation with adversarial random forests -- MissARF

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Missing data imputation in biostatistical analysis often faces a trade-off between accuracy and computational efficiency. This paper proposes MissARF, a generative imputation method based on Adversarial Random Forests (ARF), supporting both single and multiple imputation. Its core innovation lies in leveraging ARF to efficiently model high-dimensional conditional distributions and directly generating imputed values via conditional sampling; for multiple imputation, the same ARF model is reused without additional computational overhead. Experiments across diverse real-world and synthetic datasets demonstrate that MissARF achieves imputation accuracy comparable to state-of-the-art methods—including MICE and GAIN—while accelerating runtime by one to two orders of magnitude. The efficiency gain is especially pronounced for multiple imputation. Moreover, MissARF exhibits strong usability and scalability, making it suitable for large-scale biostatistical applications.

Technology Category

Application Category

📝 Abstract
Handling missing values is a common challenge in biostatistical analyses, typically addressed by imputation methods. We propose a novel, fast, and easy-to-use imputation method called missing value imputation with adversarial random forests (MissARF), based on generative machine learning, that provides both single and multiple imputation. MissARF employs adversarial random forest (ARF) for density estimation and data synthesis. To impute a missing value of an observation, we condition on the non-missing values and sample from the estimated conditional distribution generated by ARF. Our experiments demonstrate that MissARF performs comparably to state-of-the-art single and multiple imputation methods in terms of imputation quality and fast runtime with no additional costs for multiple imputation.
Problem

Research questions and friction points this paper is trying to address.

Develops MissARF for missing value imputation
Uses adversarial random forests for density estimation
Compares favorably to state-of-the-art imputation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial random forests for density estimation
Conditional sampling from estimated distributions
Fast runtime with no multiple imputation costs
🔎 Similar Papers
No similar papers found.