🤖 AI Summary
This work addresses the challenge of preserving privacy in API deployment scenarios, where traditional differential privacy methods like DP-SGD suffer from significant utility degradation due to high-dimensional noise and struggle against numerous adaptive adversarial queries. Within the PAC privacy framework, the authors propose an instance-level privacy constraint based on mutual information (MI) coupled with an adaptive noise calibration mechanism. This approach provides, for the first time, a linear cumulative MI guarantee against sequences of adaptively chosen adversarial queries, thereby circumventing the limitations of classical composition theorems. Combined with private response distillation, the method achieves 87.79% accuracy on CIFAR-10 under a per-query MI budget of \(2^{-32}\), with membership inference attack (MIA) success rates as low as 51.08% after one million queries. On an ImageNet subset, the distilled model attains 91.86% accuracy while bounding MIA success at 50.49%.
📝 Abstract
Modern machine learning models are increasingly deployed behind APIs. This renders standard weight-privatization methods (e.g. DP-SGD) unnecessarily noisy at the cost of utility. While model weights may vary significantly across training datasets, model responses to specific inputs are much lower dimensional and more stable. This motivates enforcing privacy guarantees directly on model outputs. We approach this under PAC privacy, which provides instance-based privacy guarantees for arbitrary black-box functions by controlling mutual information (MI). Importantly, PAC privacy explicitly rewards output stability with reduced noise levels. However, a central challenge remains: response privacy requires composing a large number of adaptively chosen, potentially adversarial queries issued by untrusted users, where existing composition results on PAC privacy are inadequate. We introduce a new algorithm that achieves adversarial composition via adaptive noise calibration and prove that mutual information guarantees accumulate linearly under adaptive and adversarial querying. Experiments across tabular, vision, and NLP tasks show that our method achieves high utility at extremely small per-query privacy budgets. On CIFAR-10, we achieve 87.79% accuracy with a per-step MI budget of $2^{-32}$. This enables serving one million queries while provably bounding membership inference attack (MIA) success rates to 51.08% -- the same guarantee of $(0.04, 10^{-5})$-DP. Furthermore, we show that private responses can be used to label public data to distill a publishable privacy-preserving model; using an ImageNet subset as a public dataset, our model distilled from 210,000 responses achieves 91.86% accuracy on CIFAR-10 with MIA success upper-bounded by 50.49%, which is comparable to $(0.02,10^{-5})$-DP.