InfoDisent: Explainability of Image Classification Models by Information Disentanglement

📅 2024-09-16

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Image classification models suffer from poor interpretability, and pretrained representations are difficult to decompose into semantically meaningful components. Method: This paper proposes an information-bottleneck-based prototype disentanglement method that adaptively decomposes the final-layer features into interpretable, atomic-level prototype parts. It extends the prototype explanation paradigm—previously limited to small-scale settings—to large-scale, complex scenarios such as ImageNet, and unifies the flexibility of post-hoc explanation with the conceptual modeling capability of self-explaining networks, supporting both ViT and CNN backbones. Contribution/Results: By jointly optimizing feature disentanglement and enforcing information compression constraints, the method significantly improves explanation fidelity and concept consistency across multiple benchmark datasets. User studies confirm that the generated prototypes exhibit high human interpretability and practical utility in clinical/diagnostic contexts, providing fine-grained, verifiable semantic grounding for model decisions.

Technology Category

Application Category

📝 Abstract

In this work, we introduce InfoDisent, a hybrid approach to explainability based on the information bottleneck principle. InfoDisent enables the disentanglement of information in the final layer of any pretrained model into atomic concepts, which can be interpreted as prototypical parts. This approach merges the flexibility of post-hoc methods with the concept-level modeling capabilities of self-explainable neural networks, such as ProtoPNets. We demonstrate the effectiveness of InfoDisent through computational experiments and user studies across various datasets using modern backbones such as ViTs and convolutional networks. Notably, InfoDisent generalizes the prototypical parts approach to novel domains (ImageNet).

Problem

Research questions and friction points this paper is trying to address.

Disentangles information in pretrained models for explainability.

Combines post-hoc methods with concept-level modeling capabilities.

Generalizes prototypical parts approach to novel domains like ImageNet.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid approach using information bottleneck principle

Disentangles information into interpretable atomic concepts

Generalizes prototypical parts to novel domains

🔎 Similar Papers

No similar papers found.