A Novel Metric for Detecting Memorization in Generative Models for Brain MRI Synthesis

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the risk of sensitive training data memorization and leakage in generative models synthesizing brain MRI scans, this paper proposes DeepSSIM—a self-supervised memorization detection metric. Methodologically, DeepSSIM integrates structure-preserving augmentation into a self-supervised learning framework, enabling anatomical similarity modeling without pixel-level alignment; it quantifies memorization risk via cosine similarity in the learned embedding space. The approach leverages latent diffusion models (LDMs) for image generation and incorporates SSIM-guided contrastive learning during training. Evaluated on 2,195 real MRI scans, DeepSSIM achieves an average 52.03% improvement in F1-score over state-of-the-art methods, demonstrating substantial gains in accuracy, robustness, and scalability for memorization detection in medical image synthesis.

Technology Category

Application Category

📝 Abstract
Deep generative models have emerged as a transformative tool in medical imaging, offering substantial potential for synthetic data generation. However, recent empirical studies highlight a critical vulnerability: these models can memorize sensitive training data, posing significant risks of unauthorized patient information disclosure. Detecting memorization in generative models remains particularly challenging, necessitating scalable methods capable of identifying training data leakage across large sets of generated samples. In this work, we propose DeepSSIM, a novel self-supervised metric for quantifying memorization in generative models. DeepSSIM is trained to: i) project images into a learned embedding space and ii) force the cosine similarity between embeddings to match the ground-truth SSIM (Structural Similarity Index) scores computed in the image space. To capture domain-specific anatomical features, training incorporates structure-preserving augmentations, allowing DeepSSIM to estimate similarity reliably without requiring precise spatial alignment. We evaluate DeepSSIM in a case study involving synthetic brain MRI data generated by a Latent Diffusion Model (LDM) trained under memorization-prone conditions, using 2,195 MRI scans from two publicly available datasets (IXI and CoRR). Compared to state-of-the-art memorization metrics, DeepSSIM achieves superior performance, improving F1 scores by an average of +52.03% over the best existing method. Code and data of our approach are publicly available at the following link: https://github.com/brAIn-science/DeepSSIM.
Problem

Research questions and friction points this paper is trying to address.

Detecting memorization of sensitive training data in generative models
Identifying unauthorized patient information disclosure in medical imaging
Developing scalable methods to quantify training data leakage in synthetic MRI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised metric for quantifying generative model memorization
Learned embedding space matching cosine similarity to SSIM scores
Structure-preserving augmentations for domain-specific anatomical features
🔎 Similar Papers
No similar papers found.