Anonymity Unveiled: A Practical Framework for Auditing Data Use in Deep Learning Models

📅 2024-09-10

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address privacy loss arising from unauthorized use of users’ sensitive data—such as facial images—in training deep learning models, this paper proposes MembershipTracker, the first lightweight data membership auditing framework designed for non-expert users. It integrates imperceptible data watermarking with customized membership inference verification, achieving 100% true positive rate (TPR) and 0% false positive rate (FPR) using only 0.005%–0.1% labeled samples. This breaks the fundamental trade-off between practicality and reliability in existing privacy auditing methods. Technically, MembershipTracker introduces a synergistic mechanism that jointly models model memorization and rigorously verifies robustness against adversarial countermeasures. It demonstrates strong resilience against state-of-the-art anti-auditing techniques—including gradient masking, label smoothing, and differential privacy—on industrial-scale benchmarks such as ImageNet-1K.

Technology Category

Application Category

📝 Abstract

The rise of deep learning (DL) has led to a surging demand for training data, which incentivizes the creators of DL models to trawl through the Internet for training materials. Meanwhile, users often have limited control over whether their data (e.g., facial images) are used to train DL models without their consent, which has engendered pressing concerns. This work proposes MembershipTracker, a practical data auditing tool that can empower ordinary users to reliably detect the unauthorized use of their data in training DL models. We view data auditing through the lens of membership inference (MI). MembershipTracker consists of a lightweight data marking component to mark the target data with small and targeted changes, which can be strongly memorized by the model trained on them; and a specialized MI-based verification process to audit whether the model exhibits strong memorization on the target samples. MembershipTracker only requires the users to mark a small fraction of data (0.005% to 0.1% in proportion to the training set), and it enables the users to reliably detect the unauthorized use of their data (average 0% FPR@100% TPR). We show that MembershipTracker is highly effective across various settings, including industry-scale training on the full-size ImageNet-1k dataset. We finally evaluate MembershipTracker under multiple classes of countermeasures.

Problem

Research questions and friction points this paper is trying to address.

Detect unauthorized use of user data in DL models

Enable reliable data auditing via membership inference

Provide practical tool for users to protect their data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight data marking for memorization

Specialized MI-based verification process

Effective with minimal marked data

🔎 Similar Papers

No similar papers found.