Enhancing Pre-trained Representation Classifiability can Boost its Interpretability

πŸ“… 2025-10-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study investigates whether the classification capability and interpretability of pretrained vision models can be jointly enhanced. Addressing the unclear relationship between these two properties, we propose the Intrinsic Interpretability Score (IIS)β€”a metric quantifying the proportion of interpretable semantics in model representations via explanation methodsβ€”and theoretically and empirically establish a significant positive correlation between IIS and classification accuracy. Building on this insight, we design a fine-grained explanation-guided fine-tuning paradigm that directly optimizes IIS to drive interpretability-aware representation learning. Extensive evaluation across multiple vision tasks demonstrates that our method not only improves classification accuracy but also substantially reduces performance degradation induced by post-hoc explanations. Our core contributions are twofold: (1) the first quantitative characterization of a positive correlation between interpretability and classification capability in vision representations; and (2) the first training paradigm that explicitly maximizes interpretability as an optimization objective while concurrently enhancing downstream classification performance.

Technology Category

Application Category

πŸ“ Abstract
The visual representation of a pre-trained model prioritizes the classifiability on downstream tasks, while the widespread applications for pre-trained visual models have posed new requirements for representation interpretability. However, it remains unclear whether the pre-trained representations can achieve high interpretability and classifiability simultaneously. To answer this question, we quantify the representation interpretability by leveraging its correlation with the ratio of interpretable semantics within the representations. Given the pre-trained representations, only the interpretable semantics can be captured by interpretations, whereas the uninterpretable part leads to information loss. Based on this fact, we propose the Inherent Interpretability Score (IIS) that evaluates the information loss, measures the ratio of interpretable semantics, and quantifies the representation interpretability. In the evaluation of the representation interpretability with different classifiability, we surprisingly discover that the interpretability and classifiability are positively correlated, i.e., representations with higher classifiability provide more interpretable semantics that can be captured in the interpretations. This observation further supports two benefits to the pre-trained representations. First, the classifiability of representations can be further improved by fine-tuning with interpretability maximization. Second, with the classifiability improvement for the representations, we obtain predictions based on their interpretations with less accuracy degradation. The discovered positive correlation and corresponding applications show that practitioners can unify the improvements in interpretability and classifiability for pre-trained vision models. Codes are available at https://github.com/ssfgunner/IIS.
Problem

Research questions and friction points this paper is trying to address.

Quantifying interpretability of pre-trained representations using semantic ratios
Investigating correlation between representation interpretability and classifiability
Enhancing classifiability through interpretability maximization in vision models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Inherent Interpretability Score to quantify representation interpretability
Discovers positive correlation between classifiability and interpretability in representations
Enhances classifiability via fine-tuning with interpretability maximization
S
Shufan Shen
Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS
Zhaobo Qi
Zhaobo Qi
HIT
video understandingmultimodal reasoning
J
Junshu Sun
Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS
Qingming Huang
Qingming Huang
University of the Chinese Academy of Sciences
Multimedia Analysis and RetrievalImage and Video ProcessingPattern RecognitionComputer VisionVideo Coding
Q
Qi Tian
Huawei Inc.
S
Shuhui Wang
Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS