Leveraging Large Language Models for Rare Disease Named Entity Recognition

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Rare disease named entity recognition (NER) faces core challenges including scarce annotated data, semantic ambiguity, and long-tailed entity distributions. To address these, we propose a domain-knowledge-enhanced structured prompting framework for large language models (LLMs), featuring two semantics-guided in-context example selection strategies to strengthen few-shot learning under low-resource conditions. Our approach integrates zero-shot prompting, retrieval-augmented generation (RAG), and task-level fine-tuning, while incorporating medical knowledge encoding and entity disambiguation rules. Evaluated on the RareDis Corpus, our few-shot prompting method substantially reduces annotation cost while achieving strong performance; task-level fine-tuning attains a new state-of-the-art (SOTA). Notably, GPT-4o matches or surpasses BioClinicalBERT—a specialized biomedical language model—on this task. This work presents the first systematic empirical validation of synergistic optimization between structured prompting and domain knowledge for enhancing LLMs in rare disease NER.

Technology Category

Application Category

📝 Abstract

Named Entity Recognition (NER) in the rare disease domain poses unique challenges due to limited labeled data, semantic ambiguity between entity types, and long-tail distributions. In this study, we evaluate the capabilities of GPT-4o for rare disease NER under low-resource settings, using a range of prompt-based strategies including zero-shot prompting, few-shot in-context learning, retrieval-augmented generation (RAG), and task-level fine-tuning. We design a structured prompting framework that encodes domain-specific knowledge and disambiguation rules for four entity types. We further introduce two semantically guided few-shot example selection methods to improve in-context performance while reducing labeling effort. Experiments on the RareDis Corpus show that GPT-4o achieves competitive or superior performance compared to BioClinicalBERT, with task-level fine-tuning yielding new state-of-the-art (SOTA) results. Cost-performance analysis reveals that few-shot prompting delivers high returns at low token budgets, while RAG offers marginal additional benefit. An error taxonomy highlights common failure modes such as boundary drift and type confusion, suggesting opportunities for post-processing and hybrid refinement. Our results demonstrate that prompt-optimized LLMs can serve as effective, scalable alternatives to traditional supervised models in biomedical NER, particularly in rare disease applications where annotated data is scarce.

Problem

Research questions and friction points this paper is trying to address.

Addressing rare disease NER challenges with limited labeled data

Evaluating GPT-4o performance in low-resource NER settings

Improving entity disambiguation and example selection methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes GPT-4o for rare disease NER

Introduces structured prompting with domain rules

Employs semantically guided few-shot selection

🔎 Similar Papers

No similar papers found.

Authors to Follow