π€ AI Summary
This work addresses the semantic gap between predictive uncertainty in deep learning models and linguistic uncertainty expressed in radiologistsβ free-text reports, aiming to enhance clinical interpretability and trustworthiness of automated chest X-ray interpretation. Method: We systematically quantify the correlation between Bayesian approximate uncertainty (via MC Dropout and Deep Ensembles) and linguistically derived uncertainty (extracted from radiology reports using BERT-based text encoding), employing uncertainty annotation, binary evaluation, and multi-strategy comparison. Results: We find only moderate correlation (Spearman Ο β 0.4β0.5), indicating that current data-driven uncertainty estimation fails to capture fine-grained semantic distinctions inherent in clinical language. Contribution: Our study identifies the fundamental bottleneck in aligning model and human uncertainty representations and provides empirical evidence and methodological pathways toward a language-aware, human-machine collaborative uncertainty modeling paradigm for medical AI.
π Abstract
Automating chest radiograph interpretation using Deep Learning (DL) models has the potential to significantly improve clinical workflows, decision-making, and large-scale health screening. However, in medical settings, merely optimising predictive performance is insufficient, as the quantification of uncertainty is equally crucial. This paper investigates the relationship between predictive uncertainty, derived from Bayesian Deep Learning approximations, and human/linguistic uncertainty, as estimated from free-text radiology reports labelled by rule-based labellers. Utilising BERT as the model of choice, this study evaluates different binarisation methods for uncertainty labels and explores the efficacy of Monte Carlo Dropout and Deep Ensembles in estimating predictive uncertainty. The results demonstrate good model performance, but also a modest correlation between predictive and linguistic uncertainty, highlighting the challenges in aligning machine uncertainty with human interpretation nuances. Our findings suggest that while Bayesian approximations provide valuable uncertainty estimates, further refinement is necessary to fully capture and utilise the subtleties of human uncertainty in clinical applications.