Towards Black-Box Membership Inference Attack for Diffusion Models

📅 2024-05-25

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 1

career value

187K/year

🤖 AI Summary

Copyright attribution for diffusion model training data remains challenging, particularly under black-box API access where internal model parameters or architecture are unavailable. Method: This paper proposes the first membership inference attack (MIA) tailored to black-box image-to-image translation APIs (e.g., Stable Diffusion API), enabling detection of whether a given image was used in training—without requiring access to the U-Net architecture or model weights. It introduces a novel noise-prediction deviation statistics framework for black-box MIA and extends MIA to Diffusion Transformer architectures for the first time, incorporating multi-step denoising residual analysis and consistency modeling. Results: Our method significantly outperforms existing MIAs on DDIM and Stable Diffusion, achieving up to a 12.7% improvement in attack accuracy. It further demonstrates strong generalizability on Diffusion Transformers, eliminating reliance on gradients or model weights—a critical advancement for practical copyright auditing in commercial diffusion API settings.

Technology Category

Application Category

📝 Abstract

Given the rising popularity of AI-generated art and the associated copyright concerns, identifying whether an artwork was used to train a diffusion model is an important research topic. The work approaches this problem from the membership inference attack (MIA) perspective. We first identify the limitation of applying existing MIA methods for proprietary diffusion models: the required access of internal U-nets. To address the above problem, we introduce a novel membership inference attack method that uses only the image-to-image variation API and operates without access to the model's internal U-net. Our method is based on the intuition that the model can more easily obtain an unbiased noise prediction estimate for images from the training set. By applying the API multiple times to the target image, averaging the outputs, and comparing the result to the original image, our approach can classify whether a sample was part of the training set. We validate our method using DDIM and Stable Diffusion setups and further extend both our approach and existing algorithms to the Diffusion Transformer architecture. Our experimental results consistently outperform previous methods.

Problem

Research questions and friction points this paper is trying to address.

Identifying if artwork was used to train diffusion models

Overcoming limitations of existing MIA methods for proprietary models

Developing a black-box MIA method without internal U-net access

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses image-to-image variation API only

Averages outputs for noise prediction comparison

Extends method to Diffusion Transformer architecture

🔎 Similar Papers

A Probabilistic Fluctuation based Membership Inference Attack for Diffusion Models