🤖 AI Summary
To address privacy leakage risks in medical data sharing, this paper proposes a heart disease prediction framework integrating differential privacy (DP) with federated learning (FL), enabling cross-institutional collaborative modeling without exchanging raw health records. Methodologically, we are the first to embed the Laplace mechanism into the Federated Averaging (FedAvg) protocol and design a hybrid XGBoost–MLP model to optimize the privacy–utility trade-off under a strict privacy budget (ε = 2.0). Evaluated on the UCI Heart Disease dataset, the framework achieves 85% test accuracy while ensuring all raw data remain exclusively on local devices—zero data upload occurs. The approach simultaneously delivers high discriminative performance and strong formal privacy guarantees, fully complying with GDPR and HIPAA regulations. By providing verifiable privacy assurances and practical deployability, our framework establishes a novel paradigm for privacy-sensitive, distributed AI modeling in healthcare.
📝 Abstract
With the rapid digitalization of healthcare systems, there has been a substantial increase in the generation and sharing of private health data. Safeguarding patient information is essential for maintaining consumer trust and ensuring compliance with legal data protection regulations. Machine learning is critical in healthcare, supporting personalized treatment, early disease detection, predictive analytics, image interpretation, drug discovery, efficient operations, and patient monitoring. It enhances decision-making, accelerates research, reduces errors, and improves patient outcomes. In this paper, we utilize machine learning methodologies, including differential privacy and federated learning, to develop privacy-preserving models that enable healthcare stakeholders to extract insights without compromising individual privacy. Differential privacy introduces noise to data to guarantee statistical privacy, while federated learning enables collaborative model training across decentralized datasets. We explore applying these technologies to Heart Disease Data, demonstrating how they preserve privacy while delivering valuable insights and comprehensive analysis. Our results show that using a federated learning model with differential privacy achieved a test accuracy of 85%, ensuring patient data remained secure and private throughout the process.