Revealing the Unseen: Guiding Personalized Diffusion Models to Expose Training Data

📅 2024-10-03

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 1

career value

207K/year

🤖 AI Summary

This study reveals that publicly shared fine-tuned diffusion models pose risks of training data leakage and copyright infringement. To address this, we first model fine-tuning as a progressive shift in the underlying data distribution and propose an out-of-distribution extrapolation–based generation guidance mechanism: high-likelihood samples are extrapolated from latent-space discrepancies between the source and fine-tuned models, then refined via image clustering for efficient data reconstruction. Our method recovers approximately 20% of original fine-tuning images on WikiArt, DreamBooth, and real-world online models—substantially outperforming existing baselines. The core contributions are: (1) establishing a theoretical link between fine-tuning and distributional shift; and (2) introducing the first scalable, black-box data extraction framework that requires no access to training logs or internal model states. This work provides a novel paradigm for security auditing and copyright governance of generative models.

Technology Category

Application Category

📝 Abstract

Diffusion Models (DMs) have evolved into advanced image generation tools, especially for few-shot fine-tuning where a pretrained DM is fine-tuned on a small set of images to capture specific styles or objects. Many people upload these personalized checkpoints online, fostering communities such as Civitai and HuggingFace. However, model owners may overlook the potential risks of data leakage by releasing their fine-tuned checkpoints. Moreover, concerns regarding copyright violations arise when unauthorized data is used during fine-tuning. In this paper, we ask:"Can training data be extracted from these fine-tuned DMs shared online?"A successful extraction would present not only data leakage threats but also offer tangible evidence of copyright infringement. To answer this, we propose FineXtract, a framework for extracting fine-tuning data. Our method approximates fine-tuning as a gradual shift in the model's learned distribution -- from the original pretrained DM toward the fine-tuning data. By extrapolating the models before and after fine-tuning, we guide the generation toward high-probability regions within the fine-tuned data distribution. We then apply a clustering algorithm to extract the most probable images from those generated using this extrapolated guidance. Experiments on DMs fine-tuned with datasets such as WikiArt, DreamBooth, and real-world checkpoints posted online validate the effectiveness of our method, extracting approximately 20% of fine-tuning data in most cases, significantly surpassing baseline performance.

Problem

Research questions and friction points this paper is trying to address.

Extracting training data from fine-tuned diffusion models

Assessing data leakage risks in shared checkpoints

Detecting copyright violations in fine-tuning datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts training data via model guidance

Uses clustering to identify probable images

Approximates fine-tuning as distribution shift

🔎 Similar Papers

No similar papers found.