Cancer Vaccine Adjuvant Name Recognition from Biomedical Literature using Large Language Models

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of adjuvant name identification and high annotation costs in cancer vaccine development. We conduct the first systematic evaluation of GPT-4o and Llama-3.2 for zero-shot and few-shot adjuvant entity extraction from clinical trial records and literature abstracts. We propose a prompt-learning framework integrating explicit instruction design, context enhancement (e.g., intervention-related cues), and modeling of adjuvant naming variants, validated via an automated + manual dual-track verification protocol. Experimental results show that GPT-4o achieves F1-scores of 77.32% on VAC and 81.67% on AdjuvareDB, with perfect precision—significantly outperforming Llama-3.2-3B. Ablation studies confirm that contextual information critically improves recall. Our work establishes a reproducible, large language model–driven paradigm for adjuvant knowledge graph construction and immunotherapy-related text mining.

Technology Category

Application Category

📝 Abstract
Motivation: An adjuvant is a chemical incorporated into vaccines that enhances their efficacy by improving the immune response. Identifying adjuvant names from cancer vaccine studies is essential for furthering research and enhancing immunotherapies. However, the manual curation from the constantly expanding biomedical literature poses significant challenges. This study explores the automated recognition of vaccine adjuvant names using Large Language Models (LLMs), specifically Generative Pretrained Transformers (GPT) and Large Language Model Meta AI (Llama). Methods: We utilized two datasets: 97 clinical trial records from AdjuvareDB and 290 abstracts annotated with the Vaccine Adjuvant Compendium (VAC). GPT-4o and Llama 3.2 were employed in zero-shot and few-shot learning paradigms with up to four examples per prompt. Prompts explicitly targeted adjuvant names, testing the impact of contextual information such as substances or interventions. Outputs underwent automated and manual validation for accuracy and consistency. Results: GPT-4o attained 100% Precision across all situations while exhibiting notable improve in Recall and F1-scores, particularly with incorporating interventions. On the VAC dataset, GPT-4o achieved a maximum F1-score of 77.32% with interventions, surpassing Llama-3.2-3B by approximately 2%. On the AdjuvareDB dataset, GPT-4o reached an F1-score of 81.67% for three-shot prompting with interventions, surpassing Llama-3.2-3 B's maximum F1-score of 65.62%. Conclusion: Our findings demonstrate that LLMs excel at identifying adjuvant names, including rare variations of naming representation. This study emphasizes the capability of LLMs to enhance cancer vaccine development by efficiently extracting insights. Future work aims to broaden the framework to encompass various biomedical literature and enhance model generalizability across various vaccines and adjuvants.
Problem

Research questions and friction points this paper is trying to address.

Automated recognition of cancer vaccine adjuvant names
Utilizing Large Language Models for biomedical literature
Enhancing immunotherapy research through adjuvant identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes GPT-4 and Llama for recognition
Applies zero-shot and few-shot learning
Automates adjuvant name extraction effectively
🔎 Similar Papers
No similar papers found.
Hasin Rehana
Hasin Rehana
Graduate Research Assistant, University of North Dakota
Machine LearningDeep LearningNatural Language ProcessingData MiningBioinformatics
J
Jie Zheng
Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, 48109, USA
L
Leo Yeh
Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, 48109, USA
B
Benu Bansal
School of Electrical Engineering & Computer Science, University of North Dakota, Grand Forks, North Dakota, 58202, USA; Department of Biomedical Engineering, University of North Dakota, Grand Forks, North Dakota, 58202, USA
N
Nur Bengisu Çam
Department of Computer Engineering, Bogazici University, 34342 Istanbul, Turkey
C
Christianah Jemiyo
Department of Biomedical Sciences, University of North Dakota School of Medicine and Health Sciences, Grand Forks, North Dakota, 58202, USA
B
Brett A. McGregor
Department of Biomedical Sciences, University of North Dakota School of Medicine and Health Sciences, Grand Forks, North Dakota, 58202, USA
A
Arzucan Özgür
Department of Computer Engineering, Bogazici University, 34342 Istanbul, Turkey
Yongqun He
Yongqun He
Professor, University of Michigan Medical School
Biomedical OntologyVaccine InformaticsNephrologyBrucellosisCOVID-19
J
Junguk Hur
Department of Biomedical Sciences, University of North Dakota School of Medicine, 1301 N Columbia Rd. Grand Forks, ND 58202