An Efficient Framework for Crediting Data Contributors of Diffusion Models

📅 2024-06-09
📈 Citations: 1
Influential: 1
📄 PDF
🤖 AI Summary
This work addresses the challenge of attributing individual data providers’ contributions to global model performance—specifically image quality, diversity, and aesthetic appeal—in diffusion models. We propose an efficient, scalable framework for approximating Shapley values, uniquely integrating structured model pruning with LoRA-based fine-tuning to estimate contributions without repeated full-model retraining. This synergy preserves attribution fairness while drastically improving computational efficiency. Our method further incorporates multi-task inference-based evaluation to enable fine-grained quantification of data contribution. Experiments on CIFAR-10, CelebA-HQ, and a Post-Impressionist art dataset demonstrate that our framework significantly outperforms existing attribution methods in identifying critical data contributors. The results provide a robust, trustworthy technical foundation for designing data-sharing incentive mechanisms and equitable compensation policies in collaborative generative modeling.

Technology Category

Application Category

📝 Abstract
As diffusion models are deployed in real-world settings, and their performance is driven by training data, appraising the contribution of data contributors is crucial to creating incentives for sharing quality data and to implementing policies for data compensation. Depending on the use case, model performance corresponds to various global properties of the distribution learned by a diffusion model (e.g., overall aesthetic quality). Hence, here we address the problem of attributing global properties of diffusion models to data contributors. The Shapley value provides a principled approach to valuation by uniquely satisfying game-theoretic axioms of fairness. However, estimating Shapley values for diffusion models is computationally impractical because it requires retraining on many training data subsets corresponding to different contributors and rerunning inference. We introduce a method to efficiently retrain and rerun inference for Shapley value estimation, by leveraging model pruning and fine-tuning. We evaluate the utility of our method with three use cases: (i) image quality for a DDPM trained on a CIFAR dataset, (ii) demographic diversity for an LDM trained on CelebA-HQ, and (iii) aesthetic quality for a Stable Diffusion model LoRA-finetuned on Post-Impressionist artworks. Our results empirically demonstrate that our framework can identify important data contributors across models' global properties, outperforming existing attribution methods for diffusion models.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Model
Data Contribution Assessment
Fair Compensation Policy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shapley Value
Pruning and Fine-tuning
Diffusion Model Analysis
🔎 Similar Papers
No similar papers found.
C
Chris Lin
Paul G. Allen School of Computer Science & Engineering, University of Washington
M
Mingyu Lu
Paul G. Allen School of Computer Science & Engineering, University of Washington
C
Chanwoo Kim
Paul G. Allen School of Computer Science & Engineering, University of Washington
Su-In Lee
Su-In Lee
Computer Science & Engineering, University of Washington
AIMLComputational biology & medicine