🤖 AI Summary
This work investigates the fundamental statistical limits of classification and regression under label-only differential privacy (Label DP)—a practical setting where only labels, not features, are protected—in both local (LDP) and central (CDP) models. Using multi-hypothesis testing, information-theoretic lower bound analysis, and optimal perturbation mechanism design, we derive tight minimax convergence rates, establishing the first matching upper and lower bounds. Our results show that Label LDP achieves a polynomially faster risk convergence rate compared to full-feature DP, whereas Label CDP yields only a constant-factor improvement. This work systematically characterizes the utility–privacy trade-off inherent in label protection, providing foundational theoretical guarantees and precise rate benchmarks for lightweight private learning.
📝 Abstract
Label differential privacy (DP) is designed for learning problems involving private labels and public features. While various methods have been proposed for learning under label DP, the theoretical limits remain largely unexplored. In this paper, we investigate the fundamental limits of learning with label DP in both local and central models for both classification and regression tasks, characterized by minimax convergence rates. We establish lower bounds by converting each task into a multiple hypothesis testing problem and bounding the test error. Additionally, we develop algorithms that yield matching upper bounds. Our results demonstrate that under label local DP (LDP), the risk has a significantly faster convergence rate than that under full LDP, i.e. protecting both features and labels, indicating the advantages of relaxing the DP definition to focus solely on labels. In contrast, under the label central DP (CDP), the risk is only reduced by a constant factor compared to full DP, indicating that the relaxation of CDP only has limited benefits on the performance.