Differentially Private Explanations for Clusters

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of low interpretability in black-box clustering under differential privacy (DP) constraints. We propose DPClustX, the first end-to-end DP-guaranteed framework for global explanation of clustering results. Its core innovation lies in jointly optimizing sensitivity analysis, post-hoc clustering label refinement, noisy feature importance aggregation, and statistical significance testing—enabling robust cluster-level feature summarization under stringent privacy budgets (ε ≤ 1). Compared to baseline methods, DPClustX achieves over 82% explanation accuracy across multiple real-world datasets, while simultaneously ensuring high explanation fidelity, efficient privacy budget utilization, and strong robustness to noise. To our knowledge, this is the first systematic solution to the interpretability problem in DP-constrained clustering.

Technology Category

Application Category

📝 Abstract
The dire need to protect sensitive data has led to various flavors of privacy definitions. Among these, Differential privacy (DP) is considered one of the most rigorous and secure notions of privacy, enabling data analysis while preserving the privacy of data contributors. One of the fundamental tasks of data analysis is clustering , which is meant to unravel hidden patterns within complex datasets. However, interpreting clustering results poses significant challenges, and often necessitates an extensive analytical process. Interpreting clustering results under DP is even more challenging, as analysts are provided with noisy responses to queries, and longer, manual exploration sessions require additional noise to meet privacy constraints. While increasing attention has been given to clustering explanation frameworks that aim at assisting analysts by automatically uncovering the characteristics of each cluster, such frameworks may also disclose sensitive information within the dataset, leading to a breach in privacy. To address these challenges, we present DPClustX, a framework that provides explanations for black-box clustering results while satisfying DP. DPClustX takes as input the sensitive dataset alongside privately computed clustering labels, and outputs a global explanation, emphasizing prominent characteristics of each cluster while guaranteeing DP. We perform an extensive experimental analysis of DPClustX on real data, showing that it provides insightful and accurate explanations even under tight privacy constraints.
Problem

Research questions and friction points this paper is trying to address.

Provide differentially private explanations for clustering results
Interpret noisy clustering outputs under privacy constraints
Balance cluster explanation accuracy with rigorous data privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentially private clustering explanation framework
Global explanations for black-box clusters
Guarantees privacy while analyzing sensitive data
🔎 Similar Papers
No similar papers found.