Data augmented bootstrap: Unifying confidence interval construction by approximate invariance

📅 2026-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of constructing confidence intervals with reliable coverage properties in the absence of exact group symmetry. It proposes Data Augmentation Bootstrap (DAB), a novel framework that integrates data augmentation into statistical inference by leveraging approximate invariance under transformations. DAB unifies existing approaches—such as bootstrap, conformal prediction, and SymmPI—as special cases through a common formulation based on Kolmogorov distance to quantify approximate invariance. By matching conditional means and variances under Gaussian universality, the method provides finite-sample and asymptotic coverage guarantees without requiring explicit group structure assumptions. Empirical evaluations across image, language, and scientific datasets demonstrate that DAB substantially improves the coverage performance of diverse inference methods.
📝 Abstract
We propose the data augmented bootstrap (DAB), a framework for constructing confidence intervals from approximately invariant transformations of the data. As special cases, DAB recovers popular methods that rely on exact group symmetries, such as conformal prediction, wild bootstrap for Maximum Mean Discrepancy U-statistics and the recently proposed SymmPI. Meanwhile, DAB also recovers the classical bootstrap method, which exploits the dataset's approximate invariance under uniform sampling of data indices as the dataset size grows. For all DAB methods, we establish theoretical coverage results that interpolate between finite-sample and asymptotic guarantees according to the strength of the invariance, and without assuming a group structure. The approximate invariance is measured in the Kolmogorov distance and, for statistics that satisfy Gaussian universality, reduces to conditional mean and variance matching. This allows us to incorporate data augmentation (DA), a widely used machine learning heuristic based on approximate invariances, into known statistical methods. We empirically test the performance of incorporating DA into bootstrap, wild bootstrap and conformal prediction for simulated settings as well as for image, language and scientific data.
Problem

Research questions and friction points this paper is trying to address.

confidence intervals
approximate invariance
data augmentation
bootstrap
coverage guarantees
Innovation

Methods, ideas, or system contributions that make the work stand out.

data augmented bootstrap
approximate invariance
confidence intervals
data augmentation
Kolmogorov distance