KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Medical code representation faces an inherent tension between knowledge graph–based approaches—which emphasize formal semantics but overlook real-world clinical patterns—and data-driven methods—which capture empirical associations yet neglect structured medical knowledge. To address this, we propose KEEP, the first framework to decouple knowledge graph embedding initialization from electronic health record–guided regularized adaptive learning. KEEP preserves ontological structure while effectively modeling clinical co-occurrence patterns. Its key innovation lies in generating general-purpose, lightweight code embeddings without task-specific fine-tuning, thereby jointly optimizing semantic fidelity and predictive performance. Extensive evaluation on UK Biobank and MIMIC-IV demonstrates that KEEP significantly outperforms conventional embedding methods and medical language models in both semantic relation modeling and downstream clinical prediction tasks—including disease risk forecasting and length-of-stay estimation—particularly under resource-constrained settings.

Technology Category

Application Category

📝 Abstract

Machine learning in healthcare requires effective representation of structured medical codes, but current methods face a trade off: knowledge graph based approaches capture formal relationships but miss real world patterns, while data driven methods learn empirical associations but often overlook structured knowledge in medical terminologies. We present KEEP (Knowledge preserving and Empirically refined Embedding Process), an efficient framework that bridges this gap by combining knowledge graph embeddings with adaptive learning from clinical data. KEEP first generates embeddings from knowledge graphs, then employs regularized training on patient records to adaptively integrate empirical patterns while preserving ontological relationships. Importantly, KEEP produces final embeddings without task specific auxiliary or end to end training enabling KEEP to support multiple downstream applications and model architectures. Evaluations on structured EHR from UK Biobank and MIMIC IV demonstrate that KEEP outperforms both traditional and Language Model based approaches in capturing semantic relationships and predicting clinical outcomes. Moreover, KEEP's minimal computational requirements make it particularly suitable for resource constrained environments.

Problem

Research questions and friction points this paper is trying to address.

Integrating medical ontologies with clinical data

Bridging knowledge graphs and empirical data-driven methods

Creating robust medical code embeddings for healthcare applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates knowledge graphs with clinical data adaptively

Generates embeddings preserving ontological relationships via regularization

Produces task-agnostic embeddings for multiple downstream applications

🔎 Similar Papers

No similar papers found.

Authors to Follow