Bayesian Additive Distribution Regression

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work addresses the problem of distributional regression, where the goal is to predict a scalar response from distributional covariates, with the response depending on population-level rather than individual-level features. The authors propose DistBART, a novel method that models the regression functional as a linear functional whose Riesz representer is assigned a Bayesian Additive Regression Trees (BART) prior—marking the first integration of BART into the distributional regression framework. By establishing a theoretical connection to kernel methods and introducing a variant capable of learning nonlinear functionals, DistBART adaptively captures low-dimensional marginal structures inherent in the input distributions. Leveraging random feature approximations, inference is recast as sparse Bayesian linear regression, achieving a favorable balance among computational efficiency, uncertainty quantification, and posterior convergence, as demonstrated on both synthetic and real-world datasets.

Technology Category

Application Category

📝 Abstract

Distribution regression, where the goal is to predict a scalar response from a distribution-valued predictor, arises naturally in settings where observations are grouped and outcomes depend on group-level characteristics rather than on individual measurements. We introduce DistBART, a Bayesian nonparametric approach to distribution regression that models the regression function as a linear functional with the Riesz representer assigned a Bayesian additive regression trees (BART) prior. We argue that shallow decision tree ensembles encode reasonable inductive biases for tabular data, making them appropriate in settings where the functional depends primarily on low-dimensional marginals of the distributions. We show this both empirically on synthetic and real data and theoretically through an adaptive posterior concentration result. We also establish connections to kernel methods, and use this connection to motivate variants of DistBART that can learn nonlinear functionals. To enable scalability to large datasets, we develop a random-feature approximation that samples trees from the BART prior and reduces inference to sparse Bayesian linear regression, achieving computational efficiency while retaining uncertainty quantification.

Problem

Research questions and friction points this paper is trying to address.

distribution regression

Bayesian nonparametrics

scalar response prediction

distribution-valued predictor

group-level characteristics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distribution Regression

Bayesian Additive Regression Trees (BART)

Riesz Representation