AutoPCR: Automated Phenotype Concept Recognition by Prompting

📅 2025-07-25

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

Existing phenotypic concept recognition methods rely heavily on ontology-specific supervised training, exhibiting poor generalizability across diverse clinical texts and dynamically evolving biomedical terminology. To address this, we propose a three-stage ontology-agnostic prompting framework: (1) rule- and neural model–integrated entity mention extraction; (2) SapBERT-driven candidate retrieval; and (3) large language model–guided entity linking. Our approach eliminates strong dependencies on manually annotated data and ontology schemata, significantly enhancing transferability across heterogeneous text genres, ontologies, and emerging terms. Evaluated on four benchmark datasets, it achieves state-of-the-art performance at both mention-level and document-level metrics. The core innovation lies in the first integration of lightweight prompt engineering with domain-adapted retrieval, enabling robust, low-dependency phenotypic concept recognition without ontology customization or extensive labeled resources.

Technology Category

Application Category

📝 Abstract

Phenotype concept recognition (CR) is a fundamental task in biomedical text mining, enabling applications such as clinical diagnostics and knowledge graph construction. However, existing methods often require ontology-specific training and struggle to generalize across diverse text types and evolving biomedical terminology. We present AutoPCR, a prompt-based phenotype CR method that does not require ontology-specific training. AutoPCR performs CR in three stages: entity extraction using a hybrid of rule-based and neural tagging strategies, candidate retrieval via SapBERT, and entity linking through prompting a large language model. Experiments on four benchmark datasets show that AutoPCR achieves the best average and most robust performance across both mention-level and document-level evaluations, surpassing prior state-of-the-art methods. Further ablation and transfer studies demonstrate its inductive capability and generalizability to new ontologies.

Problem

Research questions and friction points this paper is trying to address.

Automated phenotype recognition without ontology-specific training

Generalizing across diverse text and evolving biomedical terms

Improving robustness in mention and document-level evaluations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid rule-neural entity extraction

SapBERT-based candidate retrieval

LLM prompting for entity linking

🔎 Similar Papers

High-Throughput Phenotyping of Clinical Text Using Large Language Models