🤖 AI Summary
This study addresses the automated identification of immune checkpoint inhibitor (ICI)-related cancer studies in the Gene Expression Omnibus (GEO) database, tackling three key challenges: semantic ambiguity, extreme class imbalance, and scarcity of labeled data. We propose the first few-shot text classification framework integrating prototypical networks with Low-Rank Adaptation (LoRA): leveraging PubMedBERT, we design an episodic prototype-based training strategy to enhance inter-class discriminability while preserving domain knowledge via efficient low-rank fine-tuning. Evaluated on a real-world GEO subset, our method achieves an F1-score of 0.624 (recall: 0.887), substantially outperforming standard fine-tuning. Applied to 44,000 unlabeled studies, it reduces manual curation effort by 82%. Our core contribution is the pioneering application of meta-learning to biomedical literature annotation—achieving strong generalization, model interpretability, and deployment efficiency.
📝 Abstract
Identifying immune checkpoint inhibitor (ICI) studies in genomic repositories like Gene Expression Omnibus (GEO) is vital for cancer research yet remains challenging due to semantic ambiguity, extreme class imbalance, and limited labeled data in low-resource settings. We present ProtoBERT-LoRA, a hybrid framework that combines PubMedBERT with prototypical networks and Low-Rank Adaptation (LoRA) for efficient fine-tuning. The model enforces class-separable embeddings via episodic prototype training while preserving biomedical domain knowledge. Our dataset was divided as: Training (20 positive, 20 negative), Prototype Set (10 positive, 10 negative), Validation (20 positive, 200 negative), and Test (71 positive, 765 negative). Evaluated on test dataset, ProtoBERT-LoRA achieved F1-score of 0.624 (precision: 0.481, recall: 0.887), outperforming the rule-based system, machine learning baselines and finetuned PubMedBERT. Application to 44,287 unlabeled studies reduced manual review efforts by 82%. Ablation studies confirmed that combining prototypes with LoRA improved performance by 29% over stand-alone LoRA.