Weakly Supervised Medical Entity Extraction and Linking for Chief Complaints

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical chief complaint texts exhibit high lexical variability and suffer from a lack of annotated data, hindering terminology standardization. To address this, we propose a weakly supervised, end-to-end framework for entity extraction and ontology linking. Our approach introduces a novel “split-and-match” algorithm that automatically generates high-quality weak supervision signals—eliminating the need for manual annotation—and jointly models mention detection and standardized concept linking within a BERT-based architecture. Evaluated on 1.2 million real-world chief complaint records, our method significantly outperforms existing unsupervised and weakly supervised baselines in both precision and cross-institutional generalizability. It achieves robust performance without domain-specific lexicons or handcrafted rules, offering a scalable, low-dependency solution for clinical natural language processing tasks requiring consistent medical terminology normalization.

Technology Category

Application Category

📝 Abstract
A Chief complaint (CC) is the reason for the medical visit as stated in the patient's own words. It helps medical professionals to quickly understand a patient's situation, and also serves as a short summary for medical text mining. However, chief complaint records often take a variety of entering methods, resulting in a wide variation of medical notations, which makes it difficult to standardize across different medical institutions for record keeping or text mining. In this study, we propose a weakly supervised method to automatically extract and link entities in chief complaints in the absence of human annotation. We first adopt a split-and-match algorithm to produce weak annotations, including entity mention spans and class labels, on 1.2 million real-world de-identified and IRB approved chief complaint records. Then we train a BERT-based model with generated weak labels to locate entity mentions in chief complaint text and link them to a pre-defined ontology. We conducted extensive experiments, and the results showed that our Weakly Supervised Entity Extraction and Linking (ours) method produced superior performance over previous methods without any human annotation.
Problem

Research questions and friction points this paper is trying to address.

Extracting medical entities from chief complaints
Linking entities to ontology without human annotation
Standardizing varied medical notations across institutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weakly supervised split-and-match algorithm
BERT-based model for entity extraction
Automatic linking to predefined ontology
🔎 Similar Papers
No similar papers found.
Zhimeng Luo
Zhimeng Luo
University of Pittsburgh
Nature Language ProcessingData MiningHealth Informatics
Z
Zhendong Wang
School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, United States
Rui Meng
Rui Meng
Salesforce Research
Machine LearningNatural Language Processing
D
Diyang Xue
School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, United States
Adam Frisch
Adam Frisch
University of Pittsburgh
D
Daqing He
School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, United States