ImmunoFOMO: Are Language Models missing what oncologists see?

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates large language models’ (LLMs’) ability to comprehend fine-grained immunotherapeutic biomarkers—such as PD-L1 and tumor mutational burden (TMB)—in breast cancer, using oncologist-annotated labels as the gold standard. Method: We systematically evaluate BERT, BioBERT, LLaMA, and GPT-series models on an expert-curated dataset of breast cancer literature abstracts, employing zero-shot and few-shot prompting, embedding similarity analysis, and concept activation mapping. Contribution/Results: Domain-adapted smaller models—particularly BioBERT—achieve 78% accuracy, significantly outperforming GPT-4 (62%) and LLaMA-3 (59%). This constitutes the first empirical evidence that pre-trained compact models can surpass general-purpose LLMs in specific clinical concept recognition. Based on these findings, we propose ImmunoFOMO, a clinical-cognitive alignment evaluation framework that challenges the prevailing “scale-driven performance” paradigm in medical AI, underscoring the critical importance of domain knowledge integration and task-specific alignment.

Technology Category

Application Category

📝 Abstract

Language models (LMs) capabilities have grown with a fast pace over the past decade leading researchers in various disciplines, such as biomedical research, to increasingly explore the utility of LMs in their day-to-day applications. Domain specific language models have already been in use for biomedical natural language processing (NLP) applications. Recently however, the interest has grown towards medical language models and their understanding capabilities. In this paper, we investigate the medical conceptual grounding of various language models against expert clinicians for identification of hallmarks of immunotherapy in breast cancer abstracts. Our results show that pre-trained language models have potential to outperform large language models in identifying very specific (low-level) concepts.

Problem

Research questions and friction points this paper is trying to address.

Assessing LMs' medical concept understanding vs clinicians

Evaluating LMs for immunotherapy hallmark identification

Comparing pre-trained and large LMs on specific concepts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates medical conceptual grounding of LMs

Compares LMs with expert clinicians

Pre-trained LMs outperform in specific concepts

🔎 Similar Papers

No similar papers found.

Authors to Follow