Tree-Embedded Bayesian Factor Models for Multidimensional Categorical Distributions

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge posed by multi-source grouped data—such as regional distributions of age or income—that often lack clear clustering structures, thereby limiting the effectiveness of conventional mixture models. To overcome this, the authors propose a Bayesian hierarchical modeling approach based on tree embeddings. Specifically, categorical distributions are mapped into Euclidean space via tree embeddings, circumventing restrictive parametric assumptions. Within this embedded space, a Bayesian latent factor model is constructed, augmented with a spatial autoregressive (SAR) prior to explicitly capture inter-regional dependencies. Empirical evaluations on real-world demographic data demonstrate that the proposed model significantly outperforms standard Dirichlet mixture models and parametric baselines, achieving both flexibility and efficiency in jointly modeling shared patterns and heterogeneity across regions.

Technology Category

Application Category

📝 Abstract
Analyzing data collected from multiple sources to estimate common and heterogeneous structures through a hierarchical model is a central task in Bayesian inference, and to this end, Bayesian factor models are one of the most widely used tools for this purpose. In this paper, we propose a new Bayesian latent factor model for distributions, providing a parsimonious model for describing many observed distributions through lower-dimensional structures. Many applications are found in the social science in the form of grouped data, for example, distributions of age composition and income observed across locations. In these contexts, standard mixture models can be inefficient because the distributions do not necessarily exhibit clear clustering structures. To overcome the difficulty, we introduce a tree-based transformation that embeds distributions into a Euclidean space and construct a Bayesian latent factor model in the transformed space. Once transformed into Euclidean vectors, the Bayesian hierarchical model can be extended in a straightforward manner. As an illustration, we incorporate spatial dependence by introducing a prior based on a simultaneous autoregressive (SAR) model. The proposed model is "nonparametric" in the sense that it does not impose parametric assumptions on the form of the observed distributions. Through numerical experiments using real population data, we demonstrate that the proposed model outperforms the standard Dirichlet mixture model as well as models built on parametric assumptions.
Problem

Research questions and friction points this paper is trying to address.

Bayesian factor models
multidimensional categorical distributions
nonparametric modeling
hierarchical modeling
distributional data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree-embedded transformation
Bayesian latent factor model
Nonparametric distribution modeling
Hierarchical Bayesian inference
Spatial dependence
🔎 Similar Papers
No similar papers found.
Naoki Awaya
Naoki Awaya
Waseda University
Statistics
K
Keisuke Sasaki
Graduate School of Economics, The University of Tokyo
G
Genya Kobayashi
School of Commerce, Meiji University
Shonosuke Sugasawa
Shonosuke Sugasawa
Faculty of Economics, Keio University
Bayesian statisticsHierarchical modelingSpatio-temporal statistics