Revisiting the ABCs of Working with AI: A Replication with Radiologists

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates the heterogeneity in the impact of AI assistance on radiologists’ diagnostic performance, with a focus on how baseline competence and belief calibration moderate the benefits of human–AI collaboration. Leveraging the Collab-CXR dataset and a repeated-case experimental design, the authors analyze interpretations of 11,420 chest X-rays by 68 radiologists, integrating econometric and machine learning methodologies. The work provides the first empirical validation in medical imaging that the interplay between ability and belief calibration predicts gains from AI support, revealing that radiologists with lower baseline ability but higher belief calibration derive the greatest benefit. These findings substantially extend the external validity of Caplin et al.’s theoretical framework, demonstrating its robust applicability in real-world clinical settings.

📝 Abstract

Artificial intelligence (AI) systems increasingly assist human experts, but the consequences of AI assistance on productivity can be heterogeneous. Caplin, Deming, S. Li, Martin, Marx, Weidmann, and Ye (2025b) provide evidence that two characteristics, ability and belief calibration, help to determine the returns to AI assistance. This note shows that their results replicate to a setting where professional radiologists analyze chest X-rays with access to state-of-the-art machine learning predictions. I leverage the public Collab-CXR data repository described by Moehring, Kutwal, Huang, Banerjee, Jacobi, Eber, Mendoza, Chung, Dayan, Gupta, Bui, Truong, Pareek, Langlotz, Lungren, Agarwal, Rajpurkar, and Salz (2025) and first analyzed for human-AI collaboration by Agarwal, Moehring, Rajpurkar, and Salz (2023). To faithfully reproduce the analysis in Caplin, Deming, S. Li, Martin, Marx, Weidmann, and Ye (2025b), I use the radiologist assessments from the repeated-case designs, which include 68 radiologists and 11,420 paired radiologist-patient-pathology observations. The results of this replication support the external validity of their core findings: lower baseline ability and higher calibration predict larger incremental value from AI.

Problem

Research questions and friction points this paper is trying to address.

AI assistance

productivity

ability

belief calibration

human-AI collaboration

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-assisted diagnosis

belief calibration

human-AI collaboration