Report of the Medical Image De-Identification (MIDI) Task Group - Best Practices and Recommendations

📅 2023-03-18
🏛️ arXiv.org
📈 Citations: 5
Influential: 2
📄 PDF
🤖 AI Summary
Medical imaging data—along with associated structured data such as segmentation maps, radiotherapy dose distributions, and structured reports—pose significant re-identification risks when shared publicly across jurisdictions. Method: We propose the first systematic de-identification technical specification tailored to public release scenarios, integrating precise DICOM metadata redaction, pixel-level anonymization of sensitive anatomical regions, semantic consistency preservation for structured objects, and a quantitative privacy risk assessment framework. Contribution/Results: This work establishes the first formally defined technical boundary for medical image de-identification and introduces a standardized, modality-agnostic risk control framework encompassing both primary imaging modalities and derived objects. The resulting guidelines constitute an internationally recognized best-practice standard for de-identification, demonstrably reducing re-identification risk while enabling compliant, open-scientific sharing of imaging data for AI training and research.
📝 Abstract
This report addresses the technical aspects of de-identification of medical images of human subjects and biospecimens, such that re-identification risk of ethical, moral, and legal concern is sufficiently reduced to allow unrestricted public sharing for any purpose, regardless of the jurisdiction of the source and distribution sites. All medical images, regardless of the mode of acquisition, are considered, though the primary emphasis is on those with accompanying data elements, especially those encoded in formats in which the data elements are embedded, particularly Digital Imaging and Communications in Medicine (DICOM). These images include image-like objects such as Segmentations, Parametric Maps, and Radiotherapy (RT) Dose objects. The scope also includes related non-image objects, such as RT Structure Sets, Plans and Dose Volume Histograms, Structured Reports, and Presentation States. Only de-identification of publicly released data is considered, and alternative approaches to privacy preservation, such as federated learning for artificial intelligence (AI) model development, are out of scope, as are issues of privacy leakage from AI model sharing. Only technical issues of public sharing are addressed.
Problem

Research questions and friction points this paper is trying to address.

Reducing re-identification risk in medical images
Ensuring unrestricted public sharing of de-identified data
Focusing on DICOM and related non-image objects
Innovation

Methods, ideas, or system contributions that make the work stand out.

De-identification of medical images and biospecimens
Focus on DICOM format and embedded data elements
Excludes AI privacy issues and federated learning
🔎 Similar Papers
No similar papers found.
D
D. Clunie
A
A. Flanders
A
Adam J Taylor
B
Brad Erickson
B
Brian Bialecki
D
David Brundage
David Gutman
David Gutman
F
F. Prior
J
J. Seibert
J
J. Perry
J
J. Gichoya
J
J. Kirby
K
Katherine P. Andriole
L
Luke Geneslaw
S
S. Moore
T
TJ Fitzgerald
W
Wyatt M. Tellis
Y
Ying Xiao
Keyvan Farahani
Keyvan Farahani
Senior Data Science, Imaging and AI Program Director, NHLBI, NIH
Imaging AI & Image-guided interventions
J
James Luo
A
Alex Rosenthal
K
Kris Kandarpa
R
Rebecca Rosen
K
Kerry Goetz
D
Debra Babcock
B
Ben Xu
J
John Hsiao