🤖 AI Summary
Current biomedical vision-language models (VLMs) suffer from limited interpretability and monolithic prompting in clinical diagnosis: existing prompt optimization methods either produce opaque implicit vectors or generate single-text prompts, failing to capture the multifaceted observational basis of clinical decision-making—thus hindering trustworthy deployment in high-stakes settings. To address this, we propose an evolutionary algorithm-based framework for optimizing diverse, natural-language prompts. Leveraging large language models (LLMs), our method generates semantically rich, clinically aligned prompt ensembles; it further integrates VLM and LLM knowledge distillation to build an interpretable, multi-prompt ensemble system. Evaluated across multiple biomedical benchmarks, our approach significantly outperforms state-of-the-art prompt tuning techniques—especially under few-shot conditions—yielding measurable gains in diagnostic accuracy and clinical feature consistency. To our knowledge, this is the first work to enable transparent, robust, multi-granular clinical reasoning support through explainable, heterogeneous prompting.
📝 Abstract
The clinical adoption of biomedical vision-language models is hindered by prompt optimization techniques that produce either uninterpretable latent vectors or single textual prompts. This lack of transparency and failure to capture the multi-faceted nature of clinical diagnosis, which relies on integrating diverse observations, limits their trustworthiness in high-stakes settings. To address this, we introduce BiomedXPro, an evolutionary framework that leverages a large language model as both a biomedical knowledge extractor and an adaptive optimizer to automatically generate a diverse ensemble of interpretable, natural-language prompt pairs for disease diagnosis. Experiments on multiple biomedical benchmarks show that BiomedXPro consistently outperforms state-of-the-art prompt-tuning methods, particularly in data-scarce few-shot settings. Furthermore, our analysis demonstrates a strong semantic alignment between the discovered prompts and statistically significant clinical features, grounding the model's performance in verifiable concepts. By producing a diverse ensemble of interpretable prompts, BiomedXPro provides a verifiable basis for model predictions, representing a critical step toward the development of more trustworthy and clinically-aligned AI systems.