PE-means: Improved Differentially Private $k$-means Clustering through Private Evolution

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

264K/year
🤖 AI Summary
This work addresses the challenge in differentially private k-means clustering in Euclidean space, where directly aggregating private data leads to sensitivity that scales linearly with the data domain size. To overcome this limitation, the paper introduces, for the first time, the Private Evolution (PE) framework into k-means clustering. By constructing a private histogram with constant sensitivity to guide the clustering evolution and designing task-specific evolutionary operators tailored for clustering, the method effectively circumvents high-sensitivity data aggregation. Under rigorous differential privacy guarantees, the proposed approach significantly improves clustering quality, achieving an average 20% reduction in clustering loss compared to the current state-of-the-art baseline methods.
📝 Abstract
We study the problem of differentially private (DP) $k$-means clustering in Euclidean space. Previous solutions rely on summing the private data directly, which induces a sensitivity proportional to the domain. We introduce PE-means, an extension of the private evolution (PE) algorithm (an increasingly popular method for synthetic data generation), to the problem of $k$-means clustering. The key advantage of PE is that it only computes a private histogram with constant sensitivity to guide the evolution. Our adaptation of PE includes new evolutionary operators for clustering, as well as other algorithmic improvements of independent interest. Overall, PE-means achieves an average improvement of 20% in clustering loss over state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

differentially private
k-means clustering
private evolution
sensitivity
Euclidean space
Innovation

Methods, ideas, or system contributions that make the work stand out.

differentially private clustering
private evolution
k-means
synthetic data generation
constant sensitivity