Cluster Paths: Navigating Interpretability in Neural Networks

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Deep neural networks’ visual decision-making lacks interpretability, risking unwarranted trust and latent biases. To address this, we propose *Clustered Paths*: a method that clusters hierarchical neural activations to generate human-readable, layer-wise evolving decision paths—explicitly revealing the color, texture, and contextual visual concepts leveraged by the model at each depth. We introduce the first four-dimensional quantitative evaluation framework, assessing cognitive load, class alignment, prediction fidelity, and perturbation stability. Furthermore, we integrate large language models to generate high-level semantic concept paths, enabling out-of-distribution (OOD) anomaly detection. On CIFAR-10 and CelebA, our approach accurately identifies spurious correlations with >90% fidelity and 96% perturbation stability. It scales effectively to ImageNet-scale Vision Transformers (ViTs) and achieves efficient, reliable OOD detection.

Technology Category

Application Category

📝 Abstract

While modern deep neural networks achieve impressive performance in vision tasks, they remain opaque in their decision processes, risking unwarranted trust, undetected biases and unexpected failures. We propose cluster paths, a post-hoc interpretability method that clusters activations at selected layers and represents each input as its sequence of cluster IDs. To assess these cluster paths, we introduce four metrics: path complexity (cognitive load), weighted-path purity (class alignment), decision-alignment faithfulness (predictive fidelity), and path agreement (stability under perturbations). In a spurious-cue CIFAR-10 experiment, cluster paths identify color-based shortcuts and collapse when the cue is removed. On a five-class CelebA hair-color task, they achieve 90% faithfulness and maintain 96% agreement under Gaussian noise without sacrificing accuracy. Scaling to a Vision Transformer pretrained on ImageNet, we extend cluster paths to concept paths derived from prompting a large language model on minimal path divergences. Finally, we show that cluster paths can serve as an effective out-of-distribution (OOD) detector, reliably flagging anomalous samples before the model generates over-confident predictions. Cluster paths uncover visual concepts, such as color palettes, textures, or object contexts, at multiple network depths, demonstrating that cluster paths scale to large vision models while generating concise and human-readable explanations.

Problem

Research questions and friction points this paper is trying to address.

Interpreting opaque decision processes in deep neural networks

Detecting spurious cues and biases in model predictions

Providing scalable human-readable explanations for vision models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clusters activations into interpretable sequences of cluster IDs

Introduces four metrics to evaluate path complexity and fidelity

Extends to concept paths using large language model prompts

🔎 Similar Papers

Training Neural Networks for Modularity aids Interpretability