🤖 AI Summary
In modern Bayesian inference, scalable MCMC methods (e.g., SGLD) introduce bias that invalidates conventional sample-quality diagnostics—such as effective sample size (ESS)—particularly for assessing multivariate dependence structures, a core inferential objective. To address this, we propose the Copula Discrepancy (CD) diagnostic, which leverages Sklar’s theorem to decouple and quantify fidelity of dependence structure in biased samples, establishing the first structural-aware framework for evaluating biased samplers. CD detects tail-dependence mismatches—even when Kendall’s tau agrees—thereby distinguishing divergent extremal event behaviors. We implement CD via moment estimation and a robust MLE variant, achieving significantly lower computational overhead than Stein-based alternatives. Experiments demonstrate that CD outperforms ESS and other standard metrics in hyperparameter selection: it precisely identifies optimal configurations and uncovers critical dependence biases missed by rank-correlation–based diagnostics.
📝 Abstract
The scalable Markov chain Monte Carlo (MCMC) algorithms that underpin modern Bayesian machine learning, such as Stochastic Gradient Langevin Dynamics (SGLD), sacrifice asymptotic exactness for computational speed, creating a critical diagnostic gap: traditional sample quality measures fail catastrophically when applied to biased samplers. While powerful Stein-based diagnostics can detect distributional mismatches, they provide no direct assessment of dependence structure, often the primary inferential target in multivariate problems. We introduce the Copula Discrepancy (CD), a principled and computationally efficient diagnostic that leverages Sklar's theorem to isolate and quantify the fidelity of a sample's dependence structure independent of its marginals. Our theoretical framework provides the first structure-aware diagnostic specifically designed for the era of approximate inference. Empirically, we demonstrate that a moment-based CD dramatically outperforms standard diagnostics like effective sample size for hyperparameter selection in biased MCMC, correctly identifying optimal configurations where traditional methods fail. Furthermore, our robust MLE-based variant can detect subtle but critical mismatches in tail dependence that remain invisible to rank correlation-based approaches, distinguishing between samples with identical Kendall's tau but fundamentally different extreme-event behavior. With computational overhead orders of magnitude lower than existing Stein discrepancies, the CD provides both immediate practical value for MCMC practitioners and a theoretical foundation for the next generation of structure-aware sample quality assessment.