Stable coresets: Unleashing the power of uniform sampling

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Existing coreset construction for clustering relies on complex sampling schemes, struggling to balance computational efficiency and generalization capability. Method: This paper introduces the “stable coreset” paradigm—a theoretically grounded compromise between weak coresets (efficient but limited in applicability) and strong coresets (broadly applicable but computationally expensive). Contribution/Results: We provide the first systematic analysis demonstrating that uniform sampling yields effective coresets for 1-median and k-median problems under ℓ₁, Kendall-tau, and Jaccard metrics. Theoretically, a sample of size O(ε⁻² log d) suffices to construct a stable coreset with high probability. Empirically, our method achieves significantly faster construction than strong coreset approaches while attaining superior approximation quality compared to weak coresets—thereby unifying efficiency, accuracy, and robustness.

Technology Category

Application Category

📝 Abstract

Uniform sampling is a highly efficient method for data summarization. However, its effectiveness in producing coresets for clustering problems is not yet well understood, primarily because it generally does not yield a strong coreset, which is the prevailing notion in the literature. We formulate emph{stable coresets}, a notion that is intermediate between the standard notions of weak and strong coresets, and effectively combines the broad applicability of strong coresets with highly efficient constructions, through uniform sampling, of weak coresets. Our main result is that a uniform sample of size $O(ε^{-2}log d)$ yields, with high constant probability, a stable coreset for $1$-median in $mathbb{R}^d$ under the $ell_1$ metric. We then leverage the powerful properties of stable coresets to easily derive new coreset constructions, all through uniform sampling, for $ell_1$ and related metrics, such as Kendall-tau and Jaccard. We also show applications to fair clustering and to approximation algorithms for $k$-median problems in these metric spaces. Our experiments validate the benefits of stable coresets in practice, in terms of both construction time and approximation quality.

Problem

Research questions and friction points this paper is trying to address.

Developing stable coresets bridging weak and strong coreset properties

Enabling efficient uniform sampling for clustering coreset construction

Applying stable coresets to k-median problems and fair clustering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces stable coresets bridging weak and strong coresets

Uses uniform sampling for efficient coreset construction

Applies stable coresets to clustering and approximation algorithms

🔎 Similar Papers

In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies