EveryQuery: Zero-Shot Clinical Prediction via Task-Conditioned Pretraining over Electronic Health Records

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes EveryQuery, a novel approach that overcomes key limitations of traditional autoregressive models based on electronic health records for zero-shot clinical prediction—namely high computational cost, susceptibility to noise, and difficulty in directly addressing specific clinical queries. By integrating task-conditioned pretraining with structured query encoding, EveryQuery enables direct estimation of future clinical event probabilities in a single forward pass, without requiring fine-tuning, linear probing, or trajectory generation. The method achieves efficient zero-shot prediction across arbitrary clinical tasks, outperforming autoregressive baselines on 82% of 39 randomly selected tasks from MIMIC-IV, with an average AUC improvement of 0.16. Notably, it demonstrates substantial gains in predicting rare events, effectively alleviating the modeling challenges associated with low-prevalence outcomes.

Technology Category

Application Category

📝 Abstract
Foundation models pretrained on electronic health records (EHR) have demonstrated zero-shot clinical prediction capabilities by generating synthetic patient futures and aggregating statistics over sampled trajectories. However, this autoregressive inference procedure is computationally expensive, statistically noisy, and not natively promptable because users cannot directly condition predictions on specific clinical questions. In this preliminary work, we introduce EveryQuery, an EHR foundation model that achieves zero-shot inference through task-conditioned pre-training. Rather than generating future events, EveryQuery takes as input a patient's history and a structured query specifying a clinical task, and directly estimates the likelihood of the outcome occurring in the future window via a single forward pass. EveryQuery realizes this capability by pre-training over randomly sampled combinations of query tasks and patient contexts, directly training the model to produce correct answers to arbitrary input prompts. This enables zero-shot prediction for any task in the query space without finetuning, linear probing, or trajectory generation. On MIMIC-IV, EveryQuery outperforms an autoregressive baseline on 82% of 39 randomly sampled prediction tasks, with a mean AUC improvement of +0.16 (95% CI: [0.10,0.22]). This advantage remains consistent on tasks that were explicitly held out from the pre-training distribution. Further, EveryQuery's performance gains are most pronounced for rare clinical events, affirming and demonstrating a solution to the fundamental limitation of autoregressive inference for low-prevalence outcomes. However, at present, EveryQuery underperforms on tasks requiring disjunctive reasoning over multiple codes, such as 30-day readmission, exposing a concrete expressiveness limitation of the current query language.
Problem

Research questions and friction points this paper is trying to address.

zero-shot clinical prediction
electronic health records
autoregressive inference
task-conditioned pretraining
clinical query
Innovation

Methods, ideas, or system contributions that make the work stand out.

task-conditioned pretraining
zero-shot clinical prediction
electronic health records
structured query
foundation model
🔎 Similar Papers
No similar papers found.
P
Payal Chandak
Harvard-MIT Health Sciences and Technology
G
Gregory Kondas
Biomedical Informatics, Columbia
Isaac Kohane
Isaac Kohane
Harvard Medical School, Children's Hospital, Brigham and Women's Hospital
BioinformaticsArtificial IntelligenceAutismElectronic Health RecordsFunctional Genomics
M
Matthew McDermott
Biomedical Informatics, Columbia