🤖 AI Summary
Existing prototype-based networks (e.g., ProtoPNet) suffer from prototype redundancy and semantic overlap, leading to insufficient explanation diversity and discriminability. To address this, we propose a non-parametric part prototype learning framework: leveraging features from foundation vision models (e.g., ViT), it performs unsupervised, non-parametric clustering to automatically discover semantically distinct, diverse, and non-redundant part prototypes per class. We introduce two novel quantitative metrics—Distinctiveness Score and Comprehensiveness Score—to rigorously evaluate explanation quality. Classification is performed via prototype-matching-driven weighted prediction. On CUB-200, Stanford Cars, and Oxford-IIIT Pets, our method simultaneously improves classification accuracy (+1.2–2.8%) and explanation quality (diversity ↑37%, discriminability ↑29%). The code is publicly available.
📝 Abstract
Classifying images with an interpretable decision-making process is a long-standing problem in computer vision. In recent years, Prototypical Part Networks has gained traction as an approach for self-explainable neural networks, due to their ability to mimic human visual reasoning by providing explanations based on prototypical object parts. However, the quality of the explanations generated by these methods leaves room for improvement, as the prototypes usually focus on repetitive and redundant concepts. Leveraging recent advances in prototype learning, we present a framework for part-based interpretable image classification that learns a set of semantically distinctive object parts for each class, and provides diverse and comprehensive explanations. The core of our method is to learn the part-prototypes in a non-parametric fashion, through clustering deep features extracted from foundation vision models that encode robust semantic information. To quantitatively evaluate the quality of explanations provided by ProtoPNets, we introduce Distinctiveness Score and Comprehensiveness Score. Through evaluation on CUB-200-2011, Stanford Cars and Stanford Dogs datasets, we show that our framework compares favourably against existing ProtoPNets while achieving better interpretability. Code is available at: https://github.com/zijizhu/proto-non-param.