🤖 AI Summary
This paper studies parameter estimation of an unknown product distribution $P$ over the Boolean hypercube, given i.i.d. samples from $P$ and a potentially biased prior recommendation $Q$. The goal is to estimate $P$ within total variation distance $varepsilon$. We propose an adaptive statistical estimation framework that quantifies prior quality via the $L_1$ distance (i.e., the $ell_1$-norm of the mean vector difference) between $Q$ and $P$. Under a mild condition on this bias, our algorithm achieves sample complexity $ ilde{O}(d^{1-eta}/varepsilon^2)$, breaking the classical lower bound $Omega(d/varepsilon^2)$ for product distribution learning and attaining sublinear dependence on dimension $d$. The key contribution lies in rigorously converting imprecise prior information into provable statistical gains—specifically, accelerating learning in high-dimensional settings where the prior exhibits sparsity or mild bias. This represents the first sublinear-in-$d$ sample complexity for this problem under imperfect priors.
📝 Abstract
Given i.i.d.~samples from an unknown distribution $P$, the goal of distribution learning is to recover the parameters of a distribution that is close to $P$. When $P$ belongs to the class of product distributions on the Boolean hypercube ${0,1}^d$, it is known that $Omega(d/varepsilon^2)$ samples are necessary to learn $P$ within total variation (TV) distance $varepsilon$. We revisit this problem when the learner is also given as advice the parameters of a product distribution $Q$. We show that there is an efficient algorithm to learn $P$ within TV distance $varepsilon$ that has sample complexity $ ilde{O}(d^{1-eta}/varepsilon^2)$, if $|mathbf{p} - mathbf{q}|_1<varepsilon d^{0.5 - Omega(eta)}$. Here, $mathbf{p}$ and $mathbf{q}$ are the mean vectors of $P$ and $Q$ respectively, and no bound on $|mathbf{p} - mathbf{q}|_1$ is known to the algorithm a priori.