🤖 AI Summary
This work proposes the first multimodal slice discovery framework tailored for medical applications, addressing the limitations of existing auditing methods that rely primarily on unimodal features or metadata subgroup analyses and thus fail to comprehensively identify and interpret systematic model failures. By extending slice discovery into multimodal representation spaces, the framework integrates heterogeneous sources—such as medical images and associated clinical text—to automatically detect and explain systematic failure modes in image classifiers. Combining multimodal representation learning with interpretable slice discovery algorithms, the approach significantly outperforms unimodal baselines on the MIMIC-CXR-JPG dataset, enabling more comprehensive, accurate, and interpretable model auditing.
📝 Abstract
Despite advances in machine learning-based medical image classifiers, the safety and reliability of these systems remain major concerns in practical settings. Existing auditing approaches mainly rely on unimodal features or metadata-based subgroup analyses, which are limited in interpretability and often fail to capture hidden systematic failures. To address these limitations, we introduce the first automated auditing framework that extends slice discovery methods to multimodal representations specifically for medical applications. Comprehensive experiments were conducted under common failure scenarios using the MIMIC-CXR-JPG dataset, demonstrating the framework's strong capability in both failure discovery and explanation generation. Our results also show that multimodal information generally allows more comprehensive and effective auditing of classifiers, while unimodal variants beyond image-only inputs exhibit strong potential in scenarios where resources are constrained.