Maximum-Variance-Reduction Stratification for Improved Subsampling

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work proposes a general stratified augmentation framework to address the inefficiency of traditional subsampling methods in large-scale data settings. By incorporating, for the first time, a maximum variance reduction objective into the stratification strategy, the method optimizes both the stratification variable and interval boundaries to minimize the asymptotic variance of the estimator. It provides a theoretically grounded, efficient selection criterion applicable to any subsampling design. Leveraging asymptotic normality analysis, the proposed algorithm achieves linear computational complexity and is compatible with both uniform and non-uniform subsampling schemes. Experimental results on simulated and real-world datasets demonstrate that the approach substantially reduces estimation variance and improves accuracy, with only a linear increase in computational overhead.

Technology Category

Application Category

📝 Abstract

Subsampling is a widely used and effective approach for addressing the computational challenges posed by massive datasets. Substantial progress has been made in developing non-uniform, probability-based subsampling schemes that prioritize more informative observations. We propose a novel stratification mechanism that can be combined with existing subsampling designs to further improve estimation efficiency. We establish the estimator's asymptotic normality and quantify the resulting efficiency gains, which enables a principled procedure for selecting stratification variables and interval boundaries that target reductions in asymptotic variance. The resulting algorithm, Maximum-Variance-Reduction Stratification (MVRS), achieves significant improvements in estimation efficiency while incurring only linear additional computational cost. MVRS is applicable to both non-uniform and uniform subsampling methods. Experiments on simulated and real datasets confirm that MVRS markedly reduces estimator variance and improves accuracy compared with existing subsampling methods.

Problem

Research questions and friction points this paper is trying to address.

subsampling

estimation efficiency

asymptotic variance

stratification

massive datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximum-Variance-Reduction Stratification

subsampling

asymptotic variance reduction

stratification

estimation efficiency

🔎 Similar Papers

No similar papers found.

Authors to Follow