🤖 AI Summary
This study identifies critical flaws in current post-hoc privacy filters applied to synthetic chest X-ray data: they exhibit high sensitivity to real images but suffer from low specificity and poor consistency, failing reliably to detect near-duplicate samples generated from training data—thereby introducing false security guarantees. To address this, we propose the first tripartite evaluation framework for synthetic medical imaging, systematically assessing filters along three dimensions—sensitivity, specificity, and consistency—on both real and synthetic images. Empirical evaluation demonstrates that existing methods cannot robustly prevent training-data leakage and fall short of clinical-grade privacy assurance requirements. Our work not only quantifies key technical bottlenecks in privacy filtering but also establishes a new benchmark for privacy assessment of synthetic medical data, providing both theoretical foundations and practical standards for designing next-generation, robust, and verifiable healthcare privacy-preserving technologies.
📝 Abstract
The generation of privacy-preserving synthetic datasets is a promising avenue for overcoming data scarcity in medical AI research. Post-hoc privacy filtering techniques, designed to remove samples containing personally identifiable information, have recently been proposed as a solution. However, their effectiveness remains largely unverified. This work presents a rigorous evaluation of a filtering pipeline applied to chest X-ray synthesis. Contrary to claims from the original publications, our results demonstrate that current filters exhibit limited specificity and consistency, achieving high sensitivity only for real images while failing to reliably detect near-duplicates generated from training data. These results demonstrate a critical limitation of post-hoc filtering: rather than effectively safeguarding patient privacy, these methods may provide a false sense of security while leaving unacceptable levels of patient information exposed. We conclude that substantial advances in filter design are needed before these methods can be confidently deployed in sensitive applications.