LLM-Guided Evolution for Medical Decision Pipelines

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work proposes modeling clinical decision-making as an evolutionary search over executable programs to reduce the adaptation cost of large language models (LLMs) in clinical workflows, thereby circumventing expensive fine-tuning and manual prompt engineering. The authors introduce, for the first time, an LLM-guided MAP-Elites algorithm that automatically optimizes task-specific fitness functions during inference, leveraging a frozen visual-language model (e.g., MedGemma) and structured JSON output constraints. The approach demonstrates effectiveness across three clinical scenarios: emergency triage (achieving 87.1% Semigran accuracy and 0.97 recall), interactive patient interviews (optimizing the accuracy–cost trade-off with strong generalization), and PneumoniaMNIST image classification (where performance improves through prompt evolution alone). Gains stem from interpretable, program-level policy structures rather than superficial prompt tuning.

📝 Abstract

Adapting large language models (LLMs) to clinical workflows often requires costly fine-tuning or manual prompt and pipeline engineering. We study LLM-guided MAP-Elites evolution as an inference-time alternative for discovering medical decision strategies and provide an implementation repository at https://github.com/univanxx/llm_guided_evo_medical. We formulate urgency triage, interactive consultation, and medical image classification as evolutionary searches over executable artifacts optimized by task-specific fitness functions. Across all three settings, evolution improves over manually designed baselines under practical constraints. In triage, evolved programs increase Semigran accuracy from $77.3\%$ to $87.1\%$ and emergency recall from $0.60$ to $0.97$, while improving safety-weighted held-out MIMIC-ESI performance. In interactive consultation, evolved policies improve the accuracy--cost frontier across Llama-3, Qwen-3.5, and Gemma-4 and transfer to held-out iCRAFTMD. In PneumoniaMNIST, prompt-only evolution improves frozen MedGemma VLMs while preserving strict JSON outputs. Qualitative analysis shows that the gains come from interpretable program-level mechanisms, calibrated triage boundaries, targeted evidence acquisition, selective commitment, and finding-oriented visual decision rules, rather than superficial prompt rewording alone.

Problem

Research questions and friction points this paper is trying to address.

LLM adaptation

medical decision pipelines

clinical workflows

automated strategy discovery

inference-time optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided evolution

MAP-Elites

medical decision pipelines