🤖 AI Summary
Poor semantic interpretability of DNN feature spaces hinders precise localization of correspondences between image attributes and neural activations. To address this, we propose a plug-and-play, fine-tuning-free feature inversion framework that leverages the reverse generative process of a pre-trained diffusion model: given a target feature vector, it optimizes the generated image via Euclidean distance minimization to achieve accurate feature-to-image decoding. Our method is architecture-agnostic, supporting mainstream vision models—including CLIP, ResNet-50, and ViT—without additional training. It enables efficient, high-fidelity visualization of semantically meaningful patterns encoded by these models (e.g., texture, shape, class-discriminative features), thereby exposing their internal representational structure. Experiments demonstrate strong alignment between inverted and target features, with cosine similarity exceeding 0.92. This establishes a scalable, high-fidelity paradigm for DNN interpretability analysis.
📝 Abstract
One of the key issues in Deep Neural Networks (DNNs) is the black-box nature of their internal feature extraction process. Targeting vision-related domains, this paper focuses on analysing the feature space of a DNN by proposing a decoder that can generate images whose features are guaranteed to closely match a user-specified feature. Owing to this guarantee that is missed in past studies, our decoder allows us to evidence which of various attributes in an image are encoded into a feature by the DNN, by generating images whose features are in proximity to that feature. Our decoder is implemented as a guided diffusion model that guides the reverse image generation of a pre-trained diffusion model to minimise the Euclidean distance between the feature of a clean image estimated at each step and the user-specified feature. One practical advantage of our decoder is that it can analyse feature spaces of different DNNs with no additional training and run on a single COTS GPU. The experimental results targeting CLIP's image encoder, ResNet-50 and vision transformer demonstrate that images generated by our decoder have features remarkably similar to the user-specified ones and reveal valuable insights into these DNNs' feature spaces.