No Free Lunch for Synthetic Images under Data Scarcity Conditions

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of balancing fidelity, privacy, and downstream utility in synthetic image generation under data-scarce and privacy-sensitive conditions. The authors propose the first unified three-dimensional evaluation framework to systematically compare variational autoencoders (VAEs), generative adversarial networks (GANs), and denoising diffusion probabilistic models (DDPMs) on MNIST, OCTMNIST, and OrganAMNIST benchmarks, incorporating differential privacy mechanisms to assess their impact. The findings reveal that GANs and DDPMs maintain high fidelity and practical utility even under strong privacy constraints, whereas VAE performance degrades significantly. These results highlight fundamental differences in how generative models behave under privacy-preserving conditions and provide empirical guidance for model selection in privacy-sensitive applications.

📝 Abstract

This study investigates the trade-offs between fidelity, privacy, and utility in synthetic data generation under conditions of data scarcity and privacy sensitivity. We propose an evaluation framework that jointly assesses these three dimensions and apply it to three widely used generative models, VAE, GAN, and DDPM. The evaluation spans three image datasets, MNIST, OCTMNIST, and OrganAMNIST, encompassing both general-purpose and medical imaging domains. Notable differences arise between the three models in their behaviour when differential privacy mechanisms are introduced during training. GAN and DDPM demonstrate greater robustness, maintaining higher fidelity and downstream utility across a range of noise levels, while VAE degrades more rapidly as privacy constraints increase. This study highlights the importance of a multidimensional evaluation of deep generative models, also noting that their behaviour significantly differs when privacy techniques are applied.

Problem

Research questions and friction points this paper is trying to address.

synthetic data

data scarcity

privacy

fidelity

utility

Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic data

differential privacy

generative models