🤖 AI Summary
Transformer-based clinical language models often exhibit overconfidence and poor uncertainty calibration—particularly in ambiguous clinical cases—undermining their reliability in high-stakes medical decision-making. Method: This paper proposes a lightweight Bayesian enhancement framework that requires no model retraining or architectural modification. It integrates embedded Monte Carlo Dropout for Bayesian embedding calibration, introduces an uncertainty-weighted attention mechanism, and designs a confidence-guided, risk-aware decision module. The approach incurs <3% parameter overhead, balancing computational efficiency with robust uncertainty quantification. Results: On MedQA, PubMedQA, and MIMIC-III, the method improves uncertainty calibration by 32–48%. In simulated human-AI collaborative diagnostic scenarios, it reduces diagnostic error rates by up to 41%. This framework significantly enhances the safety and trustworthiness of clinical decision support systems, offering a plug-and-play solution for uncertainty quantification in high-risk medical AI applications.
📝 Abstract
We propose MedBayes-Lite, a lightweight Bayesian enhancement for transformer-based clinical language models designed to produce reliable, uncertainty-aware predictions. Although transformers show strong potential for clinical decision support, they remain prone to overconfidence, especially in ambiguous medical cases where calibrated uncertainty is critical. MedBayes-Lite embeds uncertainty quantification directly into existing transformer pipelines without any retraining or architectural rewiring, adding no new trainable layers and keeping parameter overhead under 3 percent. The framework integrates three components: (i) Bayesian Embedding Calibration using Monte Carlo dropout for epistemic uncertainty, (ii) Uncertainty-Weighted Attention that marginalizes over token reliability, and (iii) Confidence-Guided Decision Shaping inspired by clinical risk minimization. Across biomedical QA and clinical prediction benchmarks (MedQA, PubMedQA, MIMIC-III), MedBayes-Lite consistently improves calibration and trustworthiness, reducing overconfidence by 32 to 48 percent. In simulated clinical settings, it can prevent up to 41 percent of diagnostic errors by flagging uncertain predictions for human review. These results demonstrate its effectiveness in enabling reliable uncertainty propagation and improving interpretability in medical AI systems.