PURGE: Projected Unlearning via Retain-Guided Erasure

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work addresses the challenge of effective machine unlearning by removing specified data while preserving model performance on retained data and defending against membership inference attacks. Leveraging the duality between continual learning and unlearning, the authors propose a retention-guided obfuscation objective to replace the conventional uniform distribution target, coupled with A-GEM gradient projection to constrain the unlearning process from increasing loss on the retained dataset. Effective erasure of information in hidden layers is achieved through multi-layer intermediate representation removal and an adaptive stopping mechanism. Evaluated across 22 tasks on five datasets, the method achieves over 96% accuracy on retained data and reduces membership inference attack AUROC close to the ideal value of 0.5, significantly outperforming existing baselines.

📝 Abstract

We propose PURGE, a machine unlearning algorithm built on a simple but an under-exploited observation: continual learning (CL) and machine unlearning (MU) which are fundamentally dual problems. CL tries to learn new tasks without forgetting old ones; MU tries to erase specific data without hurting retained performance representing the same underlying tension in opposite directions. PURGE leverages this duality by adapting gradient projection from A-GEM (Chaudhry et al., 2019) so that every unlearning step is constrained to not increase the retain-set loss. On top of this, it performs multi-layer representation erasure, pushing forget-set activations in intermediate layers towards the retain distribution to remove information from hidden representations rather than just suppressing it at the output. A key design choice is the retain-confusion target: rather than pushing forget outputs toward the uniform distribution, which we found to be surprisingly easy for membership inference attacks to detect, we instead target the model's natural confusion pattern on retain data. This makes the unlearned model hard to distinguish from one retrained from scratch. Two self-regulating stopping criteria (a retain-loss budget and a forget-accuracy target) let the algorithm decide on its own when to stop, removing the need for manual epoch tuning. In experiments on five datasets (CIFAR-10, MNIST, SVHN, STL10, PathMNIST) across 22 class-level forgetting tasks, PURGE consistently keeps retain accuracy above 96% while achieving MIA AUROC close to 0.5 (the ideal), outperforming gradient ascent, KL-uniform, and several published baselines on the privacy-utility frontier.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

data erasure

membership inference attack

retain performance

privacy-utility tradeoff

Innovation

Methods, ideas, or system contributions that make the work stand out.

machine unlearning

gradient projection

representation erasure