🤖 AI Summary
This work addresses the challenge in differentially private k-means clustering in Euclidean space, where directly aggregating private data leads to sensitivity that scales linearly with the data domain size. To overcome this limitation, the paper introduces, for the first time, the Private Evolution (PE) framework into k-means clustering. By constructing a private histogram with constant sensitivity to guide the clustering evolution and designing task-specific evolutionary operators tailored for clustering, the method effectively circumvents high-sensitivity data aggregation. Under rigorous differential privacy guarantees, the proposed approach significantly improves clustering quality, achieving an average 20% reduction in clustering loss compared to the current state-of-the-art baseline methods.
📝 Abstract
We study the problem of differentially private (DP) $k$-means clustering in Euclidean space. Previous solutions rely on summing the private data directly, which induces a sensitivity proportional to the domain. We introduce PE-means, an extension of the private evolution (PE) algorithm (an increasingly popular method for synthetic data generation), to the problem of $k$-means clustering. The key advantage of PE is that it only computes a private histogram with constant sensitivity to guide the evolution. Our adaptation of PE includes new evolutionary operators for clustering, as well as other algorithmic improvements of independent interest. Overall, PE-means achieves an average improvement of 20% in clustering loss over state-of-the-art baselines.