🤖 AI Summary
Existing post-hoc interpretability methods (e.g., GradCAM) suffer from conceptual ambiguity, while prototype-based approaches (e.g., ProtoPNet, PIPNet) rely on fixed local patches, compromising robustness and semantic consistency. To address these limitations, we propose PCMNet—a novel framework that jointly performs unsupervised part discovery and end-to-end differentiable prototype learning to dynamically identify semantically clear, structurally coherent image-part prototypes. PCMNet introduces the first dynamic part-prototype clustering mechanism, eliminating reliance on predefined patches and enabling concept grouping with semantic alignment. It requires no additional annotations and achieves strong interpretability, stability, and occlusion robustness. Extensive experiments across multiple benchmarks demonstrate that PCMNet significantly outperforms ProtoPNet, PIPNet, and GradCAM in explanation quality, classification accuracy, and cross-sample stability—particularly under occlusion, where it maintains superior performance.
📝 Abstract
Deep learning has provided considerable advancements for multimedia systems, yet the interpretability of deep models remains a challenge. State-of-the-art post-hoc explainability methods, such as GradCAM, provide visual interpretation based on heatmaps but lack conceptual clarity. Prototype-based approaches, like ProtoPNet and PIPNet, offer a more structured explanation but rely on fixed patches, limiting their robustness and semantic consistency. To address these limitations, a part-prototypical concept mining network (PCMNet) is proposed that dynamically learns interpretable prototypes from meaningful regions. PCMNet clusters prototypes into concept groups, creating semantically grounded explanations without requiring additional annotations. Through a joint process of unsupervised part discovery and concept activation vector extraction, PCMNet effectively captures discriminative concepts and makes interpretable classification decisions. Our extensive experiments comparing PCMNet against state-of-the-art methods on multiple datasets show that it can provide a high level of interpretability, stability, and robustness under clean and occluded scenarios.