Tree-Based Predictive Models for Noisy Input Data

📅 2026-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the detrimental impact of covariate measurement error in high-noise settings, which severely compromises parameter estimation, uncertainty quantification, and predictive robustness in conventional models. To overcome this limitation, the authors propose meBART—the first nonparametric approach that integrates a measurement error model directly into the Bayesian Additive Regression Trees (BART) framework. By end-to-end propagation and correction of input uncertainty, meBART enables joint modeling and inference with noisy covariates. The method innovatively combines Bayesian nonparametric tree ensembles with classical measurement error modeling, leveraging posterior inference to account for error-prone inputs. Extensive simulations and biomedical experiments demonstrate that meBART substantially improves the accuracy of parameter estimates, the reliability of uncertainty quantification, and overall predictive performance.

Technology Category

Application Category

📝 Abstract
Measurement error is prevalent across all domains of scientific research where only imprecise observations, rather than the true underlying values, can be obtained. For example, estimates of human microbiome diversity are based on small samples from a much larger, generally unobserved system and reflect both sampling error and technical variation. In high-noise settings like these, it becomes difficult to make accurate predictions and to summarize uncertainty. Methods have previously been proposed to accommodate measurement error in classic predictive models, such as linear regression. However, relatively little work has been done to address measurement error in more complex and flexible models. Bayesian additive regression trees (BART), a Bayesian nonparametric model that sums the output of many decision trees, offers robust predictions with built-in uncertainty quantification. In this work, we propose measurement error BART (meBART), a novel extension to the BART model that directly incorporates measurement error in the independent variable(s). Through simulation studies, we show that in the presence of measurement error, our model enables more accurate parameter estimation, more robust uncertainty quantification, and superior predictive performance. We illustrate the utility of our proposed approach through two biomedical applications where the predictors of interest are subject to measurement error.
Problem

Research questions and friction points this paper is trying to address.

measurement error
noisy input data
predictive modeling
uncertainty quantification
Bayesian additive regression trees
Innovation

Methods, ideas, or system contributions that make the work stand out.

measurement error
Bayesian additive regression trees
uncertainty quantification
noisy input data
nonparametric modeling
K
Kevin McCoy
Department of Statistics, Rice University
Z
Zachary Wooten
Department of Biostatistics, St. Jude Children’s Research Hospital
Christine B. Peterson
Christine B. Peterson
Associate Professor of Biostatistics, University of Texas MD Anderson Cancer Center
graphical modelsvariable selectionBayesian statisticsmicrobiome data