De-identification is not enough: a comparison between de-identified and synthetic clinical notes

📅 2024-01-31
🏛️ Scientific Reports
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether de-identification suffices for protecting privacy in clinical text and evaluates synthetic data as a viable alternative. We conduct the first systematic comparison between de-identified clinical notes and large language model–generated synthetic clinical notes enhanced with differential privacy, jointly assessing privacy preservation and downstream utility. We propose a novel dual-dimensional evaluation framework grounded in real-world re-identification attack success rates and NLP task performance. Results demonstrate that de-identification remains vulnerable to re-identification and suffers from low semantic fidelity. In contrast, synthetic notes reduce re-identification rates to below 0.5%, while achieving an F1 score of 89.2% on clinical named entity recognition—significantly outperforming de-identified counterparts (73.6%). Thus, differentially private synthetic data simultaneously delivers strong privacy guarantees and high task utility, offering a robust alternative to conventional de-identification.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

De-identification insufficient against membership inference attacks.
Synthetic clinical notes evaluated for privacy and performance.
Explored trade-offs between synthetic and real clinical notes.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used large language models for synthetic clinical notes
Evaluated synthetic notes in clinical tasks
Proposed membership inference attack on synthetic data
🔎 Similar Papers
No similar papers found.
A
Atiquer Rahman Sarkar
Dept. of Computer Science, University of Manitoba, Winnipeg, R3T 5V6, Canada
Y
Yao-Shun Chuang
School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, 77030, USA
Noman Mohammed
Noman Mohammed
Associate Professor, University of Manitoba
Data privacysecure computationapplied cryptography
Xiaoqian Jiang
Xiaoqian Jiang
McWilliams School of Biomedical Informatics, UTHealth
predictive modelinghealthcare privacy