🤖 AI Summary
This paper addresses the problem of modeling continuous, discrete, and mixed-type conditional density functions given scalar covariates, using only samples from the conditional distribution. We propose a structured additive regression framework that embeds non-negativity and integral constraints within a Bayesian Hilbert space—enabling, for the first time, unified mixed-density estimation under an additive structure. Theoretically, we establish penalized maximum likelihood estimators with asymptotic existence, uniqueness, consistency, and normality. Computationally, estimation is achieved via polynomial or Poisson regression approximations, with penalty terms controlling model complexity. The method supports statistical inference on effect densities and construction of confidence regions. Empirically, applied to SOEP data on spousal income shares (a [0,1]-bounded mixed distribution with point masses at endpoints), the approach accurately captures dynamic shifts across years, residence types, and children’s ages—demonstrating both theoretical rigor and practical utility.
📝 Abstract
We present a structured additive regression approach to model conditional densities given scalar covariates, where only samples of the conditional distributions are observed. This links our approach to distributional regression models for scalar data. The model is formulated in a Bayes Hilbert space -- preserving nonnegativity and integration to one under summation and scalar multiplication -- with respect to an arbitrary finite measure. This allows to consider, amongst others, continuous, discrete and mixed densities. Our theoretical results include asymptotic existence, uniqueness, consistency, and asymptotic normality of the penalized maximum likelihood estimator, as well as confidence regions and inference for the (effect) densities. For estimation, we propose to maximize the penalized log-likelihood corresponding to an appropriate multinomial, or equivalently, Poisson regression model, which we show to approximate the original penalized maximum likelihood problem. We apply our framework to a motivating gender economic data set from the German Socio-Economic Panel Study (SOEP), analyzing the distribution of the woman's share in a couple's total labor income given covariate effects for year, place of residence and age of the youngest child. As the income share is a continuous variable having discrete point masses at zero and one for single-earner couples, the corresponding densities are of mixed type.