🤖 AI Summary
Existing medical tabular models often neglect unstructured clinical text and fail to fully exploit semantic information embedded in structured fields, limiting their predictive performance in electronic health record (EHR) applications. To address this, we propose the first prompt-driven multimodal tabular Transformer framework. Our method introduces a medical-knowledge-guided table cell semantic alignment mechanism to jointly encode numerical values, categorical labels, and free-text entries. It employs a two-stage Transformer architecture integrating BioBERT pre-trained representations, learnable medical prompt templates, hierarchical cell embeddings, and table-level modeling. Evaluated on three clinical prediction tasks across two real-world EHR datasets, our model achieves state-of-the-art performance: up to 10.9% reduction in RMSE and 11.0% reduction in MAE; improvements of 1.6% in balanced accuracy (BACC) and 0.8% in AUROC—significantly outperforming existing approaches.
📝 Abstract
Medical tabular data, abundant in Electronic Health Records (EHRs), is a valuable resource for diverse medical tasks such as risk prediction. While deep learning approaches, particularly transformer-based models, have shown remarkable performance in tabular data prediction, there are still problems remaining for existing work to be effectively adapted into medical domain, such as ignoring unstructured free-texts and underutilizing the textual information in structured data. To address these issues, we propose PTransformer, a underline{P}rompt-based multimodal underline{Transformer} architecture designed specifically for medical tabular data. This framework consists of two critical components: a tabular cell embedding generator and a tabular transformer. The former efficiently encodes diverse modalities from both structured and unstructured tabular data into a harmonized language semantic space with the help of pre-trained sentence encoder and medical prompts. The latter integrates cell representations to generate patient embeddings for various medical tasks. In comprehensive experiments on two real-world datasets for three medical tasks, PTransformer demonstrated the improvements with 10.9%/11.0% on RMSE/MAE, 0.5%/2.2% on RMSE/MAE, and 1.6%/0.8% on BACC/AUROC compared to state-of-the-art (SOTA) baselines in predictability.