Med-gte-hybrid: A contextual embedding transformer model for extracting actionable information from clinical texts

📅 2025-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the clinical decision-making needs in chronic kidney disease (CKD) by tackling two key challenges in unstructured electronic health records (EHRs): difficulty in extracting actionable medical information and weak generalizability of semantic representations. We propose med-gte-hybrid—a novel hybrid fine-tuning paradigm built upon the gte-large architecture, uniquely integrating contrastive learning with a denoising autoencoder for context-aware embedding learning on real-world MIMIC-IV data. The model substantially enhances clinical text semantic representation capabilities, achieving state-of-the-art performance in downstream CKD tasks—including prognosis prediction, dynamic eGFR forecasting, and mortality risk stratification. It also outperforms existing methods across patient clustering and semantic retrieval metrics, and ranks first on the MTEB multilingual and multimodal evaluation benchmark. These results establish a new paradigm for precision clinical stratification, personalized intervention, and efficient EHR-driven information retrieval.

Technology Category

Application Category

📝 Abstract
We introduce a novel contextual embedding model med-gte-hybrid that was derived from the gte-large sentence transformer to extract information from unstructured clinical narratives. Our model tuning strategy for med-gte-hybrid combines contrastive learning and a denoising autoencoder. To evaluate the performance of med-gte-hybrid, we investigate several clinical prediction tasks in large patient cohorts extracted from the MIMIC-IV dataset, including Chronic Kidney Disease (CKD) patient prognosis, estimated glomerular filtration rate (eGFR) prediction, and patient mortality prediction. Furthermore, we demonstrate that the med-gte-hybrid model improves patient stratification, clustering, and text retrieval, thus outperforms current state-of-the-art models on the Massive Text Embedding Benchmark (MTEB). While some of our evaluations focus on CKD, our hybrid tuning of sentence transformers could be transferred to other medical domains and has the potential to improve clinical decision-making and personalised treatment pathways in various healthcare applications.
Problem

Research questions and friction points this paper is trying to address.

Extracts information from clinical narratives
Improves patient stratification and clustering
Enhances clinical decision-making processes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid contextual embedding model
Combines contrastive learning
Denoising autoencoder tuning
🔎 Similar Papers
No similar papers found.