ADMEDTAGGER: an annotation framework for distillation of expert knowledge for the Polish medical language

📅 2025-12-27

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

This study addresses the scarcity of expert annotators for Polish medical text and the consequent difficulty in constructing large-scale, multi-class labeled datasets. To overcome this challenge, the authors propose an efficient annotation framework based on knowledge distillation from large language models. They leverage the multilingual Llama-3.1 as a teacher model to automatically annotate a vast corpus of Polish medical texts, followed by limited human validation to produce high-quality training and test sets. Lightweight BERT variants—including DistilBERT, BioBERT, and HerBERT—are then trained on this data. Experimental results demonstrate that DistilBERT achieves F1 scores exceeding 0.80 across all five clinical categories, with three surpassing 0.93, while reducing model size by nearly 500-fold, GPU memory consumption by 300-fold, and accelerating inference by hundreds of times—enabling highly accurate and efficient medical text classification at minimal annotation cost.

📝 Abstract

In this work, we present an annotation framework that demonstrates how a multilingual LLM pretrained on a large corpus can be used as a teacher model to distill the expert knowledge needed for tagging medical texts in Polish. This work is part of a larger project called ADMEDVOICE, within which we collected an extensive corpus of medical texts representing five clinical categories - Radiology, Oncology, Cardiology, Hypertension, and Pathology. Using this data, we had to develop a multi-class classifier, but the fundamental problem turned out to be the lack of resources for annotating an adequate number of texts. Therefore, in our solution, we used the multilingual Llama3.1 model to annotate an extensive corpus of medical texts in Polish. Using our limited annotation resources, we verified only a portion of these labels, creating a test set from them. The data annotated in this way were then used for training and validation of 3 different types of classifiers based on the BERT architecture - the distilled DistilBERT model, BioBERT fine-tuned on medical data, and HerBERT fine-tuned on the Polish language corpus. Among the models we trained, the DistilBERT model achieved the best results, reaching an F1 score>0.80 for each clinical category and an F1 score>0.93 for 3 of them. In this way, we obtained a series of highly effective classifiers that represent an alternative to large language models, due to their nearly 500 times smaller size, 300 times lower GPU VRAM consumption, and several hundred times faster inference.

Problem

Research questions and friction points this paper is trying to address.

annotation scarcity

medical text classification

Polish language

expert knowledge distillation

low-resource NLP

Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge distillation

medical text classification

low-resource language