🤖 AI Summary
In medical federated learning (FL), aligning semantic meanings across heterogeneous, multi-source electronic health records (EHRs) remains challenging under strict privacy regulations (e.g., GDPR, HIPAA).
Method: This paper proposes a two-stage federated data alignment framework integrating biomedical ontologies (UMLS/SNOMED CT) with fine-tuned clinical large language models (LLMs). Structured ontology knowledge and LLM embeddings are jointly incorporated into the federated data layer to enable dynamic, interpretable concept mapping—eliminating reliance on data homogeneity typical in conventional FL. A privacy-preserving federated mapping protocol ensures regulatory compliance.
Results: Evaluated on real-world multi-center EHR data, the method achieves 92.3% cross-institutional concept mapping accuracy and improves downstream federated model AUC by 11.7%, significantly enhancing collaborative modeling over heterogeneous medical data.
📝 Abstract
The rise of electronic health records (EHRs) has unlocked new opportunities for medical research, but privacy regulations and data heterogeneity remain key barriers to large-scale machine learning. Federated learning (FL) enables collaborative modeling without sharing raw data, yet faces challenges in harmonizing diverse clinical datasets. This paper presents a two-step data alignment strategy integrating ontologies and large language models (LLMs) to support secure, privacy-preserving FL in healthcare, demonstrating its effectiveness in a real-world project involving semantic mapping of EHR data.