Is my Data in your AI Model? Membership Inference Test with Application to Face Images

📅 2024-02-14

🏛️ arXiv.org

📈 Citations: 7

✨ Influential: 0

career value

241K/year

🤖 AI Summary

Face recognition systems risk leaking training data privacy through membership inference attacks. Method: This paper proposes MINT, the first large-scale privacy auditing framework for facial images, featuring a dual-architecture discriminative model (MLP + CNN) that captures activation pattern disparities between member and non-member samples. MINT establishes a multi-source experimental framework—spanning diverse databases and state-of-the-art (SOTA) face recognition models—to enable cross-dataset and cross-model evaluation. Contribution/Results: Evaluated on a real-world dataset of 22 million face images, MINT achieves up to 90% membership inference accuracy, substantially outperforming existing baselines. It is the first study to empirically demonstrate the feasibility of membership inference in large-scale, realistic facial recognition settings. By providing a deployable, scalable privacy assessment tool, MINT advances compliance auditing for training data used in large language models and vision models.

Technology Category

Application Category

📝 Abstract

This article introduces the Membership Inference Test (MINT), a novel approach that aims to empirically assess if given data was used during the training of AI/ML models. Specifically, we propose two MINT architectures designed to learn the distinct activation patterns that emerge when an Audited Model is exposed to data used during its training process. These architectures are based on Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). The experimental framework focuses on the challenging task of Face Recognition, considering three state-of-the-art Face Recognition systems. Experiments are carried out using six publicly available databases, comprising over 22 million face images in total. Different experimental scenarios are considered depending on the context of the AI model to test. Our proposed MINT approach achieves promising results, with up to 90% accuracy, indicating the potential to recognize if an AI model has been trained with specific data. The proposed MINT approach can serve to enforce privacy and fairness in several AI applications, e.g., revealing if sensitive or private data was used for training or tuning Large Language Models (LLMs).

Problem

Research questions and friction points this paper is trying to address.

Detect if specific data was used in AI model training

Develop MLP and CNN-based methods for membership inference

Apply to face recognition systems for privacy and fairness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Membership Inference Test (MINT) for data detection

MLP and CNN architectures for activation patterns

90% accuracy in recognizing training data usage

🔎 Similar Papers

A General Framework for Data-Use Auditing of ML Models