🤖 AI Summary
This study investigates whether de-identification suffices for protecting privacy in clinical text and evaluates synthetic data as a viable alternative. We conduct the first systematic comparison between de-identified clinical notes and large language model–generated synthetic clinical notes enhanced with differential privacy, jointly assessing privacy preservation and downstream utility. We propose a novel dual-dimensional evaluation framework grounded in real-world re-identification attack success rates and NLP task performance. Results demonstrate that de-identification remains vulnerable to re-identification and suffers from low semantic fidelity. In contrast, synthetic notes reduce re-identification rates to below 0.5%, while achieving an F1 score of 89.2% on clinical named entity recognition—significantly outperforming de-identified counterparts (73.6%). Thus, differentially private synthetic data simultaneously delivers strong privacy guarantees and high task utility, offering a robust alternative to conventional de-identification.