B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

📅 2024-11-01

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limited interpretability, constrained zero-shot generalization, and high retraining cost of pretrained CNNs, Vision Transformers (ViTs), and CLIP models. We propose *B-cosification*: a plug-and-play method that replaces linear layers with B-cos transformations—introducing a weight-input cosine alignment constraint—and couples it with a modular, lightweight fine-tuning protocol. Crucially, B-cosification enables conversion of existing large models into inherently interpretable architectures without full retraining. On benchmarks including ImageNet, B-cosified models maintain or even improve classification accuracy. B-cosified CLIP achieves zero-shot performance on par with the original model, while its attribution quality matches that of B-cos models trained from scratch. Training overhead is reduced by an order of magnitude. All code and models are publicly released.

Technology Category

Application Category

📝 Abstract

B-cos Networks have been shown to be effective for obtaining highly human interpretable explanations of model decisions by architecturally enforcing stronger alignment between inputs and weight. B-cos variants of convolutional networks (CNNs) and vision transformers (ViTs), which primarily replace linear layers with B-cos transformations, perform competitively to their respective standard variants while also yielding explanations that are faithful by design. However, it has so far been necessary to train these models from scratch, which is increasingly infeasible in the era of large, pre-trained foundation models. In this work, inspired by the architectural similarities in standard DNNs and B-cos networks, we propose 'B-cosification', a novel approach to transform existing pre-trained models to become inherently interpretable. We perform a thorough study of design choices to perform this conversion, both for convolutional neural networks and vision transformers. We find that B-cosification can yield models that are on par with B-cos models trained from scratch in terms of interpretability, while often outperforming them in terms of classification performance at a fraction of the training cost. Subsequently, we apply B-cosification to a pretrained CLIP model, and show that, even with limited data and compute cost, we obtain a B-cosified version that is highly interpretable and competitive on zero shot performance across a variety of datasets. We release our code and pre-trained model weights at https://github.com/shrebox/B-cosification.

Problem

Research questions and friction points this paper is trying to address.

Model Interpretability

Transfer Learning

Zero-shot Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

B-cosization

Interpretable Models

Zero-shot Performance

🔎 Similar Papers

No similar papers found.

Authors to Follow