🤖 AI Summary
This work addresses the challenge of limited transparency and interpretability in deep learning models for medical imaging, which stems from the black-box nature of backpropagation-based optimization. To meet the stringent clinical demands for explainable decision-making, the authors propose A-ROM, a novel framework that introduces, for the first time, a fine-tuning-free, interpretable “Aristotelian” modeling paradigm to medical image analysis. Built upon a pre-trained Vision Transformer, A-ROM constructs a universal metric space and leverages layer-wise representation analysis—without backpropagation—combined with a human-readable concept dictionary and a k-nearest neighbors classifier to enable rapid, transparent few-shot modeling of medical concepts. Evaluated on MedMNIST v2, the method achieves performance comparable to standard fine-tuning while substantially enhancing model interpretability, thereby aligning with clinical requirements for transparent and trustworthy AI-assisted diagnosis.
📝 Abstract
While deep learning has achieved remarkable success in medical imaging, the "black-box" nature of backpropagation-based models remains a significant barrier to clinical adoption. To bridge this gap, we propose Aristotelian Rapid Object Modeling (A-ROM), a framework built upon the Platonic Representation Hypothesis (PRH). This hypothesis posits that models trained on vast, diverse datasets converge toward a universal and objective representation of reality. By leveraging the generalizable metric space of pretrained Vision Transformers (ViTs), A-ROM enables the rapid modeling of novel medical concepts without the computational burden or opacity of further gradient-based fine-tuning. We replace traditional, opaque decision layers with a human-readable concept dictionary and a k-Nearest Neighbors (kNN) classifier to ensure the model's logic remains interpretable. Experiments on the MedMNIST v2 suite demonstrate that A-ROM delivers performance competitive with standard benchmarks while providing a simple and scalable, "few-shot" solution that meets the rigorous transparency demands of modern clinical environments.