🤖 AI Summary
This work addresses the problem of efficiently computing the total variation distance between two mixture models of product distributions defined over discrete domains. For the general case, it presents the first polynomial-time randomized algorithm that achieves a $(1\pm\varepsilon)$ multiplicative approximation guarantee. In the specific setting where the mixtures are supported on the Boolean hypercube, the paper devises a deterministic algorithm capable of computing the distance exactly in $\mathrm{poly}(n,2^{O(k_1+k_2)})$ time, where $k_1$ and $k_2$ denote the number of components in each mixture. Furthermore, the study establishes that the problem is \#P-hard under certain conditions, thereby delineating the computational complexity boundary of this fundamental statistical task.
📝 Abstract
We study the problem of approximating the total variation distance between two mixtures of product distributions over an $n$-dimensional discrete domain. Given two mixtures $\mathbb{P}$ and $\mathbb{Q}$ with $k_1$ and $k_2$ product distributions over $[q]^n$, respectively, we give a randomized algorithm that approximates $d_{\mathrm{TV}}\left({\mathbb{P}},{\mathbb{Q}}\right)$ within a multiplicative error of $(1\pm \varepsilon)$ in time $\mathrm{poly}((nq)^{k_1+k_2},1/\varepsilon)$. We also study the special case of mixtures of Boolean subcubes over $\{0,1\}^n$. For this class, we give a deterministic algorithm that exactly computes the total variation distance in time $\mathrm{poly}(n,2^{O(k_1+k_2)})$, and show that exact computation is $\#\mathsf{P}$-hard when $k_1+k_2=Θ(n)$.