🤖 AI Summary
Medical imaging data—along with associated structured data such as segmentation maps, radiotherapy dose distributions, and structured reports—pose significant re-identification risks when shared publicly across jurisdictions.
Method: We propose the first systematic de-identification technical specification tailored to public release scenarios, integrating precise DICOM metadata redaction, pixel-level anonymization of sensitive anatomical regions, semantic consistency preservation for structured objects, and a quantitative privacy risk assessment framework.
Contribution/Results: This work establishes the first formally defined technical boundary for medical image de-identification and introduces a standardized, modality-agnostic risk control framework encompassing both primary imaging modalities and derived objects. The resulting guidelines constitute an internationally recognized best-practice standard for de-identification, demonstrably reducing re-identification risk while enabling compliant, open-scientific sharing of imaging data for AI training and research.
📝 Abstract
This report addresses the technical aspects of de-identification of medical images of human subjects and biospecimens, such that re-identification risk of ethical, moral, and legal concern is sufficiently reduced to allow unrestricted public sharing for any purpose, regardless of the jurisdiction of the source and distribution sites. All medical images, regardless of the mode of acquisition, are considered, though the primary emphasis is on those with accompanying data elements, especially those encoded in formats in which the data elements are embedded, particularly Digital Imaging and Communications in Medicine (DICOM). These images include image-like objects such as Segmentations, Parametric Maps, and Radiotherapy (RT) Dose objects. The scope also includes related non-image objects, such as RT Structure Sets, Plans and Dose Volume Histograms, Structured Reports, and Presentation States. Only de-identification of publicly released data is considered, and alternative approaches to privacy preservation, such as federated learning for artificial intelligence (AI) model development, are out of scope, as are issues of privacy leakage from AI model sharing. Only technical issues of public sharing are addressed.