🤖 AI Summary
This paper studies high-probability estimation of a discrete distribution $p$ with support size $K$ under KL divergence. We propose a novel estimator based on online-to-batch conversion and suffix averaging. It is the first to simultaneously establish tight high-probability upper and lower bounds on the KL estimation error. Theoretically, the estimator achieves, with probability at least $1-delta$, a convergence rate of $Oig((K log log K + log(1/delta) log K)/nig)$, matching the minimax lower bound up to logarithmic factors. Moreover, we provide the first high-probability characterization of the maximum likelihood estimator’s performance under both $chi^2$ and KL divergences. The key innovation lies in constructing a computationally efficient and statistically optimal estimation framework, thereby closing the long-standing gap in matching high-probability upper and lower bounds for KL divergence estimation.
📝 Abstract
We consider the problem of estimating a discrete distribution $p$ with support of size $K$ and provide both upper and lower bounds with high probability in KL divergence. We prove that in the worst case, for any estimator $widehat{p}$, with probability at least $δ$, $ ext{KL}(p | widehat{p}) geq Cmax{K,ln(K)ln(1/δ) }/n $, where $n$ is the sample size and $C > 0$ is a constant. We introduce a computationally efficient estimator $p^{ ext{OTB}}$, based on Online to Batch conversion and suffix averaging, and show that with probability at least $1 - δ$ $ ext{KL}(p | widehat{p}) leq C(Klog(log(K)) + ln(K)ln(1/δ)) /n$.
Furthermore, we also show that with sufficiently many observations relative to $log(1/δ)$, the maximum likelihood estimator $ar{p}$ guarantees that with probability at least $1-δ$ $$
1/6 χ^2(ar{p}|p) leq 1/4 χ^2(p|ar{p}) leq ext{KL}(p|ar{p}) leq C(K + log(1/δ))/n,, $$ where $χ^2$ denotes the $χ^2$-divergence.