🤖 AI Summary
Deep visual models exhibit highly distributed representations of visual concepts, making precise localization of concept-specific neural substrates challenging.
Method: This paper introduces the first fine-grained circuit discovery method tailored to specific visual concepts. It iteratively constructs semantically consistent and structurally interpretable concept-related neural circuits by jointly modeling functional dependencies among neurons and semantic alignment—without requiring human annotations. The approach supports parallel discovery and automatic parsing of multiple concepts.
Contribution/Results: Evaluated on diverse mainstream image classification models—including ResNet and ViT—the method significantly improves spatial localization accuracy and semantic fidelity at the concept level. It provides a scalable, verifiable, and granular analytical framework for decoding internal model representations, advancing interpretability beyond coarse attribution or post-hoc explanation.
📝 Abstract
Deep vision models have achieved remarkable classification performance by leveraging a hierarchical architecture in which human-interpretable concepts emerge through the composition of individual neurons across layers. Given the distributed nature of representations, pinpointing where specific visual concepts are encoded within a model remains a crucial yet challenging task. In this paper, we introduce an effective circuit discovery method, called Granular Concept Circuit (GCC), in which each circuit represents a concept relevant to a given query. To construct each circuit, our method iteratively assesses inter-neuron connectivity, focusing on both functional dependencies and semantic alignment. By automatically discovering multiple circuits, each capturing specific concepts within that query, our approach offers a profound, concept-wise interpretation of models and is the first to identify circuits tied to specific visual concepts at a fine-grained level. We validate the versatility and effectiveness of GCCs across various deep image classification models.