KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings

📅 2025-10-06
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Medical code representation faces an inherent tension between knowledge graph–based approaches—which emphasize formal semantics but overlook real-world clinical patterns—and data-driven methods—which capture empirical associations yet neglect structured medical knowledge. To address this, we propose KEEP, the first framework to decouple knowledge graph embedding initialization from electronic health record–guided regularized adaptive learning. KEEP preserves ontological structure while effectively modeling clinical co-occurrence patterns. Its key innovation lies in generating general-purpose, lightweight code embeddings without task-specific fine-tuning, thereby jointly optimizing semantic fidelity and predictive performance. Extensive evaluation on UK Biobank and MIMIC-IV demonstrates that KEEP significantly outperforms conventional embedding methods and medical language models in both semantic relation modeling and downstream clinical prediction tasks—including disease risk forecasting and length-of-stay estimation—particularly under resource-constrained settings.

Technology Category

Application Category

📝 Abstract
Machine learning in healthcare requires effective representation of structured medical codes, but current methods face a trade off: knowledge graph based approaches capture formal relationships but miss real world patterns, while data driven methods learn empirical associations but often overlook structured knowledge in medical terminologies. We present KEEP (Knowledge preserving and Empirically refined Embedding Process), an efficient framework that bridges this gap by combining knowledge graph embeddings with adaptive learning from clinical data. KEEP first generates embeddings from knowledge graphs, then employs regularized training on patient records to adaptively integrate empirical patterns while preserving ontological relationships. Importantly, KEEP produces final embeddings without task specific auxiliary or end to end training enabling KEEP to support multiple downstream applications and model architectures. Evaluations on structured EHR from UK Biobank and MIMIC IV demonstrate that KEEP outperforms both traditional and Language Model based approaches in capturing semantic relationships and predicting clinical outcomes. Moreover, KEEP's minimal computational requirements make it particularly suitable for resource constrained environments.
Problem

Research questions and friction points this paper is trying to address.

Integrating medical ontologies with clinical data
Bridging knowledge graphs and empirical data-driven methods
Creating robust medical code embeddings for healthcare applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates knowledge graphs with clinical data adaptively
Generates embeddings preserving ontological relationships via regularization
Produces task-agnostic embeddings for multiple downstream applications
🔎 Similar Papers
No similar papers found.
A
Ahmed Elhussein
Columbia University, USA
P
Paul Meddeb
New York Genome Center, USA
A
Abigail Newbury
Columbia University, USA
J
Jeanne Mirone
New York Genome Center, USA
Martin Stoll
Martin Stoll
TU Chemnitz
Numerical AnalysisScientific ComputingNumerical Linear AlgebraScientific Machine Learning
G
Gamze Gursoy
New York Genome Center, USA