🤖 AI Summary
Existing phenotypic concept recognition methods rely heavily on ontology-specific supervised training, exhibiting poor generalizability across diverse clinical texts and dynamically evolving biomedical terminology. To address this, we propose a three-stage ontology-agnostic prompting framework: (1) rule- and neural model–integrated entity mention extraction; (2) SapBERT-driven candidate retrieval; and (3) large language model–guided entity linking. Our approach eliminates strong dependencies on manually annotated data and ontology schemata, significantly enhancing transferability across heterogeneous text genres, ontologies, and emerging terms. Evaluated on four benchmark datasets, it achieves state-of-the-art performance at both mention-level and document-level metrics. The core innovation lies in the first integration of lightweight prompt engineering with domain-adapted retrieval, enabling robust, low-dependency phenotypic concept recognition without ontology customization or extensive labeled resources.
📝 Abstract
Phenotype concept recognition (CR) is a fundamental task in biomedical text mining, enabling applications such as clinical diagnostics and knowledge graph construction. However, existing methods often require ontology-specific training and struggle to generalize across diverse text types and evolving biomedical terminology. We present AutoPCR, a prompt-based phenotype CR method that does not require ontology-specific training. AutoPCR performs CR in three stages: entity extraction using a hybrid of rule-based and neural tagging strategies, candidate retrieval via SapBERT, and entity linking through prompting a large language model. Experiments on four benchmark datasets show that AutoPCR achieves the best average and most robust performance across both mention-level and document-level evaluations, surpassing prior state-of-the-art methods. Further ablation and transfer studies demonstrate its inductive capability and generalizability to new ontologies.