🤖 AI Summary
This work addresses the challenge of learning probability distributions on the simplex for categorical data modeling. The proposed method introduces a Euclidean density modeling framework based on a smooth bijective mapping: it constructs, for the first time, a smooth bijection from the open simplex to ℝ^(K−1) that preserves Aitchison geometry; integrates Dirichlet interpolation to enable continuous representation of discrete categories; and employs flow matching for efficient density estimation. Crucially, the inverse mapping exactly recovers the distribution on the original simplex, circumventing the computational complexity inherent in Riemannian manifold modeling. Experiments on synthetic and real-world datasets demonstrate that the approach achieves or surpasses state-of-the-art baselines in both distributional fidelity and sample quality, while maintaining theoretical rigor and practical generalizability.
📝 Abstract
We propose a method for learning and sampling from probability distributions supported on the simplex. Our approach maps the open simplex to Euclidean space via smooth bijections, leveraging the Aitchison geometry to define the mappings, and supports modeling categorical data by a Dirichlet interpolation that dequantizes discrete observations into continuous ones. This enables density modeling in Euclidean space through the bijection while still allowing exact recovery of the original discrete distribution. Compared to previous methods that operate on the simplex using Riemannian geometry or custom noise processes, our approach works in Euclidean space while respecting the Aitchison geometry, and achieves competitive performance on both synthetic and real-world data sets.