🤖 AI Summary
This paper addresses the diagnostic ambiguity in multivariate KL divergence arising from the entanglement of marginal mismatch and statistical dependencies. We propose an algebraically exact, fully additive hierarchical decomposition method based on Möbius inversion over the subset lattice—yielding the first closed-form, complete analytic decomposition of KL divergence. The total divergence between a joint distribution and its product reference is rigorously disentangled into a sum of independent marginal mismatch terms and all-order (r-wise) statistical dependency terms, expressed solely in terms of standard Shannon information measures, without approximations or modeling assumptions. The framework unifies higher-order mutual information, total correlation, and information-geometric principles. Numerical experiments confirm machine-precision accuracy across diverse systems, substantially enhancing attribution-based diagnostic capability for divergence sources in machine learning, econometrics, and complex systems analysis.
📝 Abstract
The Kullback-Leibler (KL) divergence is a foundational measure for comparing probability distributions. Yet in multivariate settings, its structure is often opaque, conflating marginal mismatches and statistical dependencies. We derive an algebraically exact, additive, and hierarchical decomposition of the KL divergence between a joint distribution ( P_k ) and a product reference ( Q^{otimes k} ). The total divergence splits into the sum of marginal KLs, ( sum_{i=1}^k mathrm{KL}(P_i | Q) ), and the total correlation ( C(P_k) ), which we further decompose as ( C(P_k) = sum_{r=2}^k I^{(r)}(P_k) ), using Moebius inversion on the subset lattice. Each ( I^{(r)} ) quantifies the distinct contribution of ( r )-way statistical interactions to the total divergence. This yields the first decomposition of this form that is both algebraically complete and interpretable using only standard Shannon quantities, with no approximations or model assumptions. Numerical validation using hypergeometric sampling confirms exactness to machine precision across diverse system configurations. This framework enables precise diagnosis of divergence origins, marginal versus interaction, across applications in machine learning, econometrics, and complex systems.