🤖 AI Summary
This work proposes EveryQuery, a novel approach that overcomes key limitations of traditional autoregressive models based on electronic health records for zero-shot clinical prediction—namely high computational cost, susceptibility to noise, and difficulty in directly addressing specific clinical queries. By integrating task-conditioned pretraining with structured query encoding, EveryQuery enables direct estimation of future clinical event probabilities in a single forward pass, without requiring fine-tuning, linear probing, or trajectory generation. The method achieves efficient zero-shot prediction across arbitrary clinical tasks, outperforming autoregressive baselines on 82% of 39 randomly selected tasks from MIMIC-IV, with an average AUC improvement of 0.16. Notably, it demonstrates substantial gains in predicting rare events, effectively alleviating the modeling challenges associated with low-prevalence outcomes.
📝 Abstract
Foundation models pretrained on electronic health records (EHR) have demonstrated zero-shot clinical prediction capabilities by generating synthetic patient futures and aggregating statistics over sampled trajectories. However, this autoregressive inference procedure is computationally expensive, statistically noisy, and not natively promptable because users cannot directly condition predictions on specific clinical questions. In this preliminary work, we introduce EveryQuery, an EHR foundation model that achieves zero-shot inference through task-conditioned pre-training. Rather than generating future events, EveryQuery takes as input a patient's history and a structured query specifying a clinical task, and directly estimates the likelihood of the outcome occurring in the future window via a single forward pass. EveryQuery realizes this capability by pre-training over randomly sampled combinations of query tasks and patient contexts, directly training the model to produce correct answers to arbitrary input prompts. This enables zero-shot prediction for any task in the query space without finetuning, linear probing, or trajectory generation. On MIMIC-IV, EveryQuery outperforms an autoregressive baseline on 82% of 39 randomly sampled prediction tasks, with a mean AUC improvement of +0.16 (95% CI: [0.10,0.22]). This advantage remains consistent on tasks that were explicitly held out from the pre-training distribution. Further, EveryQuery's performance gains are most pronounced for rare clinical events, affirming and demonstrating a solution to the fundamental limitation of autoregressive inference for low-prevalence outcomes. However, at present, EveryQuery underperforms on tasks requiring disjunctive reasoning over multiple codes, such as 30-day readmission, exposing a concrete expressiveness limitation of the current query language.