đ¤ AI Summary
Medical code representation faces an inherent tension between knowledge graphâbased approachesâwhich emphasize formal semantics but overlook real-world clinical patternsâand data-driven methodsâwhich capture empirical associations yet neglect structured medical knowledge. To address this, we propose KEEP, the first framework to decouple knowledge graph embedding initialization from electronic health recordâguided regularized adaptive learning. KEEP preserves ontological structure while effectively modeling clinical co-occurrence patterns. Its key innovation lies in generating general-purpose, lightweight code embeddings without task-specific fine-tuning, thereby jointly optimizing semantic fidelity and predictive performance. Extensive evaluation on UK Biobank and MIMIC-IV demonstrates that KEEP significantly outperforms conventional embedding methods and medical language models in both semantic relation modeling and downstream clinical prediction tasksâincluding disease risk forecasting and length-of-stay estimationâparticularly under resource-constrained settings.
đ Abstract
Machine learning in healthcare requires effective representation of structured medical codes, but current methods face a trade off: knowledge graph based approaches capture formal relationships but miss real world patterns, while data driven methods learn empirical associations but often overlook structured knowledge in medical terminologies. We present KEEP (Knowledge preserving and Empirically refined Embedding Process), an efficient framework that bridges this gap by combining knowledge graph embeddings with adaptive learning from clinical data. KEEP first generates embeddings from knowledge graphs, then employs regularized training on patient records to adaptively integrate empirical patterns while preserving ontological relationships. Importantly, KEEP produces final embeddings without task specific auxiliary or end to end training enabling KEEP to support multiple downstream applications and model architectures. Evaluations on structured EHR from UK Biobank and MIMIC IV demonstrate that KEEP outperforms both traditional and Language Model based approaches in capturing semantic relationships and predicting clinical outcomes. Moreover, KEEP's minimal computational requirements make it particularly suitable for resource constrained environments.