🤖 AI Summary
Machine unlearning faces a critical challenge from adversarial data fabrication—attackers can construct gradient-similar synthetic samples that induce the *appearance* of forgetting target data, while preserving underlying information. Method: This paper introduces, for the first time, the concept of an ε-forgery set and establishes a measure-theoretic detection framework. Leveraging gradient analysis, regularity assumptions, and smooth loss function theory, our approach applies to linear regression, single-layer neural networks, and mini-batch SGD. Contribution/Results: We prove that the Lebesgue measure of the ε-forgery set decays at rate ε^(d−r)/2, revealing an intrinsic limitation on adversarial forgery under non-degenerate data distributions in high dimensions. Consequently, the probability that random sampling hits a forged point vanishes asymptotically—yielding the first theoretically grounded, computationally feasible criterion for verifying genuine unlearning.
📝 Abstract
Motivated by privacy regulations and the need to mitigate the effects of harmful data, machine unlearning seeks to modify trained models so that they effectively ``forget'' designated data. A key challenge in verifying unlearning is forging -- adversarially crafting data that mimics the gradient of a target point, thereby creating the appearance of unlearning without actually removing information. To capture this phenomenon, we consider the collection of data points whose gradients approximate a target gradient within tolerance $ε$ -- which we call an $ε$-forging set -- and develop a framework for its analysis. For linear regression and one-layer neural networks, we show that the Lebesgue measure of this set is small. It scales on the order of $ε$, and when $ε$ is small enough, $ε^d$. More generally, under mild regularity assumptions, we prove that the forging set measure decays as $ε^{(d-r)/2}$, where $d$ is the data dimension and $r<d$ is the nullity of a variation matrix defined by the model gradients. Extensions to batch SGD and almost-everywhere smooth loss functions yield the same asymptotic scaling. In addition, we establish probability bounds showing that, under non-degenerate data distributions, the likelihood of randomly sampling a forging point is vanishingly small. These results provide evidence that adversarial forging is fundamentally limited and that false unlearning claims can, in principle, be detected.