Clinical trial cohort selection using Large Language Models on n2c2 Challenges

📅 2025-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior approaches exhibit limitations in fine-grained medical logical reasoning for clinical trial eligibility screening. Method: This study presents the first systematic evaluation of mainstream large language models (LLMs) on this task, proposing a zero-shot/few-shot prompting framework built upon LLaMA and GPT architectures, integrated with the n2c2 standardized clinical text corpus and rule-augmented prompting strategies. Contribution/Results: Experiments show LLMs achieve >85% accuracy on simple inclusion/exclusion criterion identification—substantially outperforming conventional NLP methods. However, performance degrades by over 30% on complex criteria requiring deep medical reasoning, revealing a critical bottleneck: strong coarse-grained pattern matching but limited fine-grained clinical inference capability. This work establishes an empirical benchmark for LLMs’ capabilities in real-world clinical decision support and provides methodological insights for advancing medically grounded reasoning in foundation models.

Technology Category

Application Category

📝 Abstract
Clinical trials are a critical process in the medical field for introducing new treatments and innovations. However, cohort selection for clinical trials is a time-consuming process that often requires manual review of patient text records for specific keywords. Though there have been studies on standardizing the information across the various platforms, Natural Language Processing (NLP) tools remain crucial for spotting eligibility criteria in textual reports. Recently, pre-trained large language models (LLMs) have gained popularity for various NLP tasks due to their ability to acquire a nuanced understanding of text. In this paper, we study the performance of large language models on clinical trial cohort selection and leverage the n2c2 challenges to benchmark their performance. Our results are promising with regard to the incorporation of LLMs for simple cohort selection tasks, but also highlight the difficulties encountered by these models as soon as fine-grained knowledge and reasoning are required.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Clinical Trial Participant Screening
Efficiency and Accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Clinical Trial Participant Screening
Medical Knowledge Processing
🔎 Similar Papers
No similar papers found.
C
Chi-en Amy Tai
University of Waterloo, Vision and Image Processing Lab, VIP, 200 University Avenue West, Waterloo, N2L3G1, Canada
Xavier Tannier
Xavier Tannier
Sorbonne Université, Limics
Natural Language ProcessingInformation ExtractionBioNLP