Using LLMs for Multilingual Clinical Entity Linking to ICD-10

๐Ÿ“… 2025-09-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the challenge of automatically linking entities in multilingual clinical texts to ICD-10 codes. We propose a novel multi-stage hybrid approach that integrates precise clinical dictionary matching with large language model (LLM)-based in-context learning. Specifically, we leverage GPT-4.1 under few-shot settings to handle out-of-vocabulary terms, enabling accurate ICD-10 code recommendations for low-resource languages such as Spanish and Greek. Our method balances rule-based interpretability with LLM-driven semantic generalization, significantly improving cross-lingual clinical information structuring efficiency and consistency. Evaluated on the CodiEsp dataset, our approach achieves 0.89 category-level F1 and 0.78 subcategory-level F1; on ElCardioCC, it attains 0.85 F1โ€”outperforming existing baselines. The framework offers a scalable, robust paradigm for multilingual clinical coding automation, advancing interoperability and standardization in global health informatics.

Technology Category

Application Category

๐Ÿ“ Abstract
The linking of clinical entities is a crucial part of extracting structured information from clinical texts. It is the process of assigning a code from a medical ontology or classification to a phrase in the text. The International Classification of Diseases - 10th revision (ICD-10) is an international standard for classifying diseases for statistical and insurance purposes. Automatically assigning the correct ICD-10 code to terms in discharge summaries will simplify the work of healthcare professionals and ensure consistent coding in hospitals. Our paper proposes an approach for linking clinical terms to ICD-10 codes in different languages using Large Language Models (LLMs). The approach consists of a multistage pipeline that uses clinical dictionaries to match unambiguous terms in the text and then applies in-context learning with GPT-4.1 to predict the ICD-10 code for the terms that do not match the dictionary. Our system shows promising results in predicting ICD-10 codes on different benchmark datasets in Spanish - 0.89 F1 for categories and 0.78 F1 on subcategories on CodiEsp, and Greek - 0.85 F1 on ElCardioCC.
Problem

Research questions and friction points this paper is trying to address.

Linking clinical terms to ICD-10 codes across languages
Automating disease classification from medical discharge summaries
Applying LLMs for multilingual clinical entity recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using clinical dictionaries for unambiguous term matching
Applying in-context learning with GPT-4.1
Multistage pipeline for multilingual ICD-10 coding
๐Ÿ”Ž Similar Papers
No similar papers found.
S
Sylvia Vassileva
Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski
I
Ivan Koychev
Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski
Svetla Boytcheva
Svetla Boytcheva
Ontotext
Artificial IntelligenceComputational LinguisticsMedical InformaticsMachine Learning