TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

📅 2026-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in clinical early-warning systems of simultaneously achieving well-calibrated continuous risk scores and clinically verifiable explanations when processing irregularly sampled medical time-series data. Existing large language models often produce overconfident binary predictions that lead to risk polarization. To overcome this, the authors propose TRIAGE, a novel framework that introduces dialectical reasoning into large language models by concurrently reasoning over competing clinical outcomes and generating corresponding explanatory justifications. This approach yields continuous risk scores grounded in explicit clinical logic. Evaluated on three medical time-series benchmarks, TRIAGE improves average AUPRC by 3.3%, reduces calibration error by 81%, and generates explanations that surpass baseline post-hoc methods by 20% in reasoning quality, thereby unifying risk calibration with interpretability.
📝 Abstract
Clinical early warning systems built on electronic health records, in which clinical observations are recorded as irregularly sampled medical time series (ISMTS), must deliver both calibrated risk scores for patient triage and interpretable rationales that clinicians can verify. Large Language Models (LLMs) have been explored for this task, yet they collapse graded clinical risk into overconfident binary predictions. This risk polarization undermines both calibration and cross-patient comparability. To address this, we propose TRIAGE, a framework that trains an LLM to generate dialectical reasoning over competing clinical outcomes by eliciting outcome-specific rationales. This dialectical formulation mitigates risk polarization, enabling a single LLM to yield continuous risk scores grounded in explicit clinical reasoning. Evaluated on three ISMTS benchmarks, TRIAGE achieves an average AUPRC improvement of 3.3% and reduces calibration error by 81% compared to the competitive baselines. An LLM-as-a-judge assessment further shows that our rationales surpass post-hoc explanations from the baseline by 20% in clinical reasoning quality. The source code is available at https://github.com/HyeongWon-Jang/TRIAGE .
Problem

Research questions and friction points this paper is trying to address.

risk prediction
irregularly sampled medical time series
clinical early warning systems
risk polarization
explainability
Innovation

Methods, ideas, or system contributions that make the work stand out.

dialectical reasoning
risk calibration
irregularly sampled medical time series
explainable AI
large language models