Assessing the Real-World Utility of Explainable AI for Arousal Diagnostics: An Application-Grounded User Study

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinicians’ trust in AI recommendations remains a critical barrier to deploying explainable AI (XAI) in sleep medicine, particularly for diagnosing nocturnal arousal events. Method: We propose a novel “white-box AI as quality control (QC)” paradigm and conducted a multi-stage user study with eight clinical experts, comparing three modes: manual scoring, real-time black-box assistance, and post-hoc white-box QC. Contribution/Results: This is the first empirical demonstration that both explanation depth and timing of XAI intervention jointly determine human-AI collaboration efficacy. The white-box QC mode improved event-level diagnostic accuracy by ~30%, enhanced count-level consistency, and reduced inter-rater variability. Most experts preferred transparent systems and affirmed their clinical utility. Critically, structured XAI integration outperformed individual expert performance, providing key empirical evidence for trustworthy clinical AI deployment.

Technology Category

Application Category

📝 Abstract
Artificial intelligence (AI) systems increasingly match or surpass human experts in biomedical signal interpretation. However, their effective integration into clinical practice requires more than high predictive accuracy. Clinicians must discern extit{when} and extit{why} to trust algorithmic recommendations. This work presents an application-grounded user study with eight professional sleep medicine practitioners, who score nocturnal arousal events in polysomnographic data under three conditions: (i) manual scoring, (ii) black-box (BB) AI assistance, and (iii) transparent white-box (WB) AI assistance. Assistance is provided either from the extit{start} of scoring or as a post-hoc quality-control ( extit{QC}) review. We systematically evaluate how the type and timing of assistance influence event-level and clinically most relevant count-based performance, time requirements, and user experience. When evaluated against the clinical standard used to train the AI, both AI and human-AI teams significantly outperform unaided experts, with collaboration also reducing inter-rater variability. Notably, transparent AI assistance applied as a targeted QC step yields median event-level performance improvements of approximately 30% over black-box assistance, and QC timing further enhances count-based outcomes. While WB and QC approaches increase the time required for scoring, start-time assistance is faster and preferred by most participants. Participants overwhelmingly favor transparency, with seven out of eight expressing willingness to adopt the system with minor or no modifications. In summary, strategically timed transparent AI assistance effectively balances accuracy and clinical efficiency, providing a promising pathway toward trustworthy AI integration and user acceptance in clinical workflows.
Problem

Research questions and friction points this paper is trying to address.

Evaluating real-world utility of explainable AI for clinical arousal diagnostics
Assessing how AI transparency and timing affect clinician performance and trust
Comparing black-box versus white-box AI assistance in sleep medicine practice
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transparent white-box AI assistance improves diagnostic accuracy
Application-grounded user study validates clinical utility
Strategic timing balances accuracy and workflow efficiency
🔎 Similar Papers
No similar papers found.
S
Stefan Kraft
IT-Designers Gruppe
Andreas Theissler
Andreas Theissler
University of Giessen
V
Vera Wienhausen-Wilke
Klinikum Esslingen, Klinik für Kardiologie, Pneumologie und Angilologie
Gjergji Kasneci
Gjergji Kasneci
Professor at the Technical University of Munich
Responsible Data ScienceResponsible AIExplainable Machine LearningAlgorithmic Accountability
H
Hendrik Lensch
University of Tübingen