🤖 AI Summary
This work proposes a Fréchet regression framework for modeling the relationship between multivariate distributional responses and Euclidean predictors, based on the nonparametric transformation (NPT) metric. The approach enhances interpretability by decomposing the distributional response into marginal distributions and dependence structure, which are modeled separately. The NPT metric serves as a closed-form surrogate for the Wasserstein distance and is shown to be topologically equivalent, thereby alleviating computational and statistical challenges in high dimensions. Theoretical analysis establishes consistency and fast convergence rates for the resulting estimators. Both simulation studies and an application to continuous glucose monitoring data demonstrate the method’s scalability, effectiveness, and practical utility.
📝 Abstract
Regression with distribution-valued responses and Euclidean predictors has gained increasing scientific relevance. While methodology for univariate distributional data has advanced rapidly in recent years, multivariate distributions, which additionally encode dependence across univariate marginals, have received less attention and pose computational and statistical challenges. In this work, we address these challenges with a new regression approach for multivariate distributional responses, in which distributions are modeled within the semiparametric nonparanormal family. By incorporating the nonparanormal transport (NPT) metric -- an efficient closed-form surrogate for the Wasserstein distance -- into the Fr\'echet regression framework, our approach decomposes the problem into separate regressions of marginal distributions and their dependence structure, facilitating both efficient estimation and granular interpretation of predictor effects. We provide theoretical justification for NPT, establishing its topological equivalence to the Wasserstein distance and proving that it mitigates the curse of dimensionality. We further prove uniform convergence guarantees for regression estimators, both when distributional responses are fully observed and when they are estimated from empirical samples, attaining fast convergence rates comparable to the univariate case. The utility of our method is demonstrated via simulations and an application to continuous glucose monitoring data.