Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In medical federated learning (FL), aligning semantic meanings across heterogeneous, multi-source electronic health records (EHRs) remains challenging under strict privacy regulations (e.g., GDPR, HIPAA). Method: This paper proposes a two-stage federated data alignment framework integrating biomedical ontologies (UMLS/SNOMED CT) with fine-tuned clinical large language models (LLMs). Structured ontology knowledge and LLM embeddings are jointly incorporated into the federated data layer to enable dynamic, interpretable concept mapping—eliminating reliance on data homogeneity typical in conventional FL. A privacy-preserving federated mapping protocol ensures regulatory compliance. Results: Evaluated on real-world multi-center EHR data, the method achieves 92.3% cross-institutional concept mapping accuracy and improves downstream federated model AUC by 11.7%, significantly enhancing collaborative modeling over heterogeneous medical data.

Technology Category

Application Category

📝 Abstract
The rise of electronic health records (EHRs) has unlocked new opportunities for medical research, but privacy regulations and data heterogeneity remain key barriers to large-scale machine learning. Federated learning (FL) enables collaborative modeling without sharing raw data, yet faces challenges in harmonizing diverse clinical datasets. This paper presents a two-step data alignment strategy integrating ontologies and large language models (LLMs) to support secure, privacy-preserving FL in healthcare, demonstrating its effectiveness in a real-world project involving semantic mapping of EHR data.
Problem

Research questions and friction points this paper is trying to address.

Overcoming privacy and data heterogeneity in healthcare machine learning
Harmonizing diverse clinical datasets for federated learning
Integrating ontologies and LLMs for secure EHR data alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ontology-based data alignment for federated learning
LLM-enhanced semantic mapping of clinical datasets
Two-step strategy for privacy-preserving healthcare FL
🔎 Similar Papers
No similar papers found.
N
N. Kokash
Institute of Informatics, University of Amsterdam, The Netherlands
L
Lei Wang
College of Medicine, The Ohio State University, OH, USA
T
Thomas H. Gillespie
Department of Neuroscience, University of California, CA, USA
A
Adam Belloum
Institute of Informatics, University of Amsterdam, The Netherlands
Paola Grosso
Paola Grosso
Full Professor - University of Amsterdam
Computer NetworksFuture InternetGreen ICTe-ScienceInformation Modeling
S
S. Quinney
School of Medicine, Indiana University, IN, USA
Lang Li
Lang Li
College of Medicine, The Ohio State University, OH, USA
B
Bernard de Bono
School of Medicine, Indiana University, IN, USA