🤖 AI Summary
This work addresses the high computational cost, utility degradation, and prediction mismatch commonly incurred by existing membership privacy protection methods that require full model retraining or weight updates. The authors observe that privacy vulnerabilities are concentrated in a very small subset of critical weights, whose sensitivity stems from their structural positions rather than their numerical values. Furthermore, they uncover an entanglement between privacy vulnerability and model learnability at these specific weights. Building on this insight, they propose a lightweight defense strategy: instead of discarding entire neurons, they score the critical weights and selectively roll them back followed by fine-tuning. Extensive experiments demonstrate that this approach substantially enhances robustness against membership inference attacks while effectively preserving model utility across diverse settings.
📝 Abstract
Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their locations rather than their values. According to these insights, to preserve privacy, we score critical weights, and instead of discarding those neurons, we rewind only the weights for fine-tuning. We show that, through extensive experiments, this mechanism exhibits outperforming resilience in most cases against Membership Inference Attacks while maintaining utility.