Curation of a Cardiology Interface Terminology for Highlighting Electronic Health Records using Machine Learning

📅 2026-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of information overload and complex terminology in electronic health records (EHRs), which can lead clinicians to overlook critical details. To mitigate this, the authors propose a three-stage, minimally supervised approach for constructing a Cardiology Interface Terminology (CIT). The method first integrates SNOMED CT subhierarchies, EHR-derived concepts, and terminological components to generate an initial CIT. It then iteratively extracts phrases and employs semi-automated review to build a training set (TCIT). Finally, a supervised machine learning model is trained on TCIT to expand the CIT and highlight relevant terms in EHRs. This novel integration of semi-automated term curation with machine learning achieves 74.21% coverage and a breadth of 1.68 on the test set, with manual evaluation of 20 random clinical notes showing 98.2% average completeness and 84.2% conciseness.
📝 Abstract
Electronic health record (EHR) notes are dense medical documents containing large amounts of information, often filled with complex medical jargon. Highlighting all details in EHRs helps reduce the likelihood of missing crucial information by drawing attention to key content. This study proposes the design of a Cardiology Interface Terminology (CIT) to accurately highlight all details in EHR notes of cardiology patients. We introduce an innovative Machine Learning (ML) technique for the design of CIT. The ML technique requires training data. Manual preparation of such training data is time-consuming and expensive. The process of the CIT design includes three phases. In the first two phases, we innovatively derive a training data CIT to be used by the third phase, ML technique. We start by designing an initial CIT, composed of several components: the cardiology-related sub-hierarchies of SNOMED, other SNOMED concepts mined from EHRs of build set, and necessary components of terms e.g., medical abbreviations and medications. Utilizing an iterative process, fine-grained phrases containing initial CIT concepts are extracted from build set as CIT concept candidates. The candidate concepts are semi-automatically reviewed before being added to CIT, yielding the training data CIT, TCIT. In the third phase, a ML model is trained with TCIT to identify candidates fitting to be concepts in the CIT. This model is used to extract further concepts from build set, yielding the final CIT. The final CIT is then used to highlight the test set and evaluate the extent to which it captures details in an unseen EHR dataset. For this purpose, four evaluation metrics, coverage, breadth, completeness, and conciseness are used. The highlighted test set has a coverage of 74.21%, with a breadth of 1.68. For 20 random notes in test set, the average completeness is 98.2% and average conciseness is 84.2%.
Problem

Research questions and friction points this paper is trying to address.

Cardiology Interface Terminology
Electronic Health Records
Information Highlighting
Clinical Terminology
Detail Extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cardiology Interface Terminology
Machine Learning
Electronic Health Records
Terminology Curation
Automated Highlighting
🔎 Similar Papers
No similar papers found.
M
Mahshad Koohi Habibi Dehkordi
Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
S
Shuxin Zhou
Department of Computer Science, St. Francis College, Brooklyn, NY, USA
Yehoshua Perl
Yehoshua Perl
Director of SABOC research center . Professor of computer science NJIT
medical terminologiesontologiesalgorithmsobject oriented data bases
F
Fadi P. Deek
Department of Informatics, New Jersey Institute of Technology, Newark, NJ, USA
James Geller
James Geller
Professor of Computer Science, New Jersey Institute of Technology
Medical InformaticsMedical TerminologiesSemantic Search of the Web
Gai Elhanan
Gai Elhanan
Associate researcher, DRI
Healthcare Informatics
Andrew J. Einstein
Andrew J. Einstein
Professor of Medicine, Columbia University
CardiologyCardiac ImagingRadiological Protection
L
Luke Lindemann
Advanced Metrics Laboratory, School of Medicine and Health Sciences, George Washington University, Washington, DC, USA
Vipina K. Keloth
Vipina K. Keloth
Yale University
Biomedical ontologiesNLP