🤖 AI Summary
Existing research inadequately explores the capability of large language models (LLMs) to jointly process clinical text and time-series data for predictive tasks. This paper proposes a lightweight, prompt-driven multimodal modeling approach: leveraging the DSPy framework to construct an instruction-tuned prompt optimization pipeline, enabling off-the-shelf LLMs—without architectural modification—to jointly reason over unstructured clinical narratives and structured temporal data (e.g., vital signs, lab results). The method achieves performance on par with specialized multimodal models across multiple clinical outcome classification tasks, while substantially reducing system complexity and improving cross-task generalization. Its core innovation lies in “injecting” temporal modeling capacity into the LLM’s prompt layer—enabling unified representation and reasoning over both textual and sequential modalities. This offers an efficient, scalable, and general-purpose solution for clinical AI.
📝 Abstract
Large language models (LLMs) excel at text generation, but their ability to handle clinical classification tasks involving structured data, such as time series, remains underexplored. In this work, we adapt instruction-tuned LLMs using DSPy-based prompt optimization to process clinical notes and structured EHR inputs jointly. Our results show that this approach achieves performance on par with specialized multimodal systems while requiring less complexity and offering greater adaptability across tasks.