Natural Language Processing for Electronic Health Records in Scandinavian Languages: Norwegian, Swedish, and Danish

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Clinical natural language processing (NLP) for Scandinavian languages remains underexplored, with no systematic, cross-lingual assessment of progress, resource availability, or methodological trends. Method: We conducted a systematic review of 113 peer-reviewed studies (2010–2024) from PubMed, ACL Anthology, IEEE Xplore, Scopus, and Web of Science, focusing on Norwegian, Swedish, and Danish clinical text processing. We quantitatively analyzed model adoption, task coverage, and resource sharing across the three languages. Results: Swedish dominates the field (72% of studies), while Norwegian (18%) and Danish (10%) lag significantly—especially in critical tasks like de-identification and in adopting Transformer-based models. Data, code, and pretrained model sharing rates are extremely low, hindering regional reproducibility and collaboration. We further evaluated rule-based systems, classical machine learning, and BERT-family models on EHR text, identifying persistent adaptation bottlenecks and limited cross-lingual transferability. This study provides the first empirical evidence of structural imbalance in Scandinavian clinical NLP and offers actionable insights for equitable, multilingual health AI resource development.

Technology Category

Application Category

📝 Abstract

Background: Clinical natural language processing (NLP) refers to the use of computational methods for extracting, processing, and analyzing unstructured clinical text data, and holds a huge potential to transform healthcare in various clinical tasks. Objective: The study aims to perform a systematic review to comprehensively assess and analyze the state-of-the-art NLP methods for the mainland Scandinavian clinical text. Method: A literature search was conducted in various online databases including PubMed, ScienceDirect, Google Scholar, ACM digital library, and IEEE Xplore between December 2022 and February 2024. Further, relevant references to the included articles were also used to solidify our search. The final pool includes articles that conducted clinical NLP in the mainland Scandinavian languages and were published in English between 2010 and 2024. Results: Out of the 113 articles, 18% (n=21) focus on Norwegian clinical text, 64% (n=72) on Swedish, 10% (n=11) on Danish, and 8% (n=9) focus on more than one language. Generally, the review identified positive developments across the region despite some observable gaps and disparities between the languages. There are substantial disparities in the level of adoption of transformer-based models. In essential tasks such as de-identification, there is significantly less research activity focusing on Norwegian and Danish compared to Swedish text. Further, the review identified a low level of sharing resources such as data, experimentation code, pre-trained models, and rate of adaptation and transfer learning in the region. Conclusion: The review presented a comprehensive assessment of the state-of-the-art Clinical NLP for electronic health records (EHR) text in mainland Scandinavian languages and, highlighted the potential barriers and challenges that hinder the rapid advancement of the field in the region.

Problem

Research questions and friction points this paper is trying to address.

Assessing NLP methods for Scandinavian clinical texts

Identifying disparities in NLP adoption across languages

Evaluating resource sharing in clinical NLP research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic review of Scandinavian clinical NLP

Transformer-based models adoption disparities

Low resource sharing in clinical NLP

🔎 Similar Papers

No similar papers found.

Authors to Follow