🤖 AI Summary
This study addresses the challenge of automatically generating accurate, concise, and interpretable computable phenotypes (CPs) for clinical decision support in treatment-resistant hypertension using large language models (LLMs). We propose a “Generate–Execute–Debug–Instruct” iterative learning framework tailored to six key clinical phenotypes, integrating program synthesis, execution-based validation, and few-shot feedback to drastically reduce reliance on large-scale annotated data. Our method automatically translates domain-specific medical knowledge into executable, verifiable clinical logic programs. Empirically, it achieves accuracy comparable to state-of-the-art machine learning models while ensuring strong interpretability through transparent, rule-based reasoning. Experiments demonstrate that clinically deployable performance is attainable with only a minimal number of labeled examples—fewer than 10 per phenotype—thereby establishing a novel paradigm for CP development in low-resource settings.
📝 Abstract
Large language models (LLMs) have demonstrated remarkable capabilities for medical question answering and programming, but their potential for generating interpretable computable phenotypes (CPs) is under-explored. In this work, we investigate whether LLMs can generate accurate and concise CPs for six clinical phenotypes of varying complexity, which could be leveraged to enable scalable clinical decision support to improve care for patients with hypertension. In addition to evaluating zero-short performance, we propose and test a synthesize, execute, debug, instruct strategy that uses LLMs to generate and iteratively refine CPs using data-driven feedback. Our results show that LLMs, coupled with iterative learning, can generate interpretable and reasonably accurate programs that approach the performance of state-of-the-art ML methods while requiring significantly fewer training examples.