Hidden Conflicts in Neural Networks and Their Implications for Explainability

📅 2023-10-31
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Implicit feature conflicts in artificial neural networks (ANNs)—long overlooked yet profoundly affecting model inference and interpretability—constitute a critical gap in explainable AI. This work establishes the first rigorous theoretical framework for neural conflict and introduces Conflict-Aware Feature Attribution (CAFE), a novel method that explicitly disentangles features’ positive and negative contributions. We further uncover systematic correlations between conflict intensity and types of distributional shift. Experiments on tabular and image data demonstrate that CAFE significantly improves explanation fidelity, outperforming baselines by an average of +18.7%. Moreover, conflict patterns serve as discriminative signatures for out-of-distribution (OOD) subtypes, achieving 89.3% classification accuracy. These findings introduce a quantifiable, diagnosable dimension of explanation—grounded in feature interaction dynamics—thereby advancing the foundations of trustworthy AI.
📝 Abstract
Artificial Neural Networks (ANNs) often represent conflicts between features, arising naturally during training as the network learns to integrate diverse and potentially disagreeing inputs to better predict the target variable. Despite their relevance to the ``reasoning'' processes of these models, the properties and implications of conflicts for understanding and explaining ANNs remain underexplored. In this paper, we develop a rigorous theory of conflicts in ANNs and demonstrate their impact on ANN explainability through two case studies. In the first case study, we use our theory of conflicts to inspire the design of a novel feature attribution method, which we call Conflict-Aware Feature-wise Explanations (CAFE). CAFE separates the positive and negative influences of features and biases, enabling more faithful explanations for models applied to tabular data. In the second case study, we take preliminary steps towards understanding the role of conflicts in out-of-distribution (OOD) scenarios. Through our experiments, we identify potentially useful connections between model conflicts and different kinds of distributional shifts in tabular and image data. Overall, our findings demonstrate the importance of accounting for conflicts in the development of more reliable explanation methods for AI systems, which are crucial for the beneficial use of these systems in the society.
Problem

Research questions and friction points this paper is trying to address.

Understanding hidden conflicts in neural networks' feature interactions
Developing conflict-aware explainability methods for ANN interpretations
Exploring conflicts' role in out-of-distribution model behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Develops theory of conflicts in ANNs
Introduces Conflict-Aware Feature-wise Explanations (CAFE)
Explores conflicts in out-of-distribution scenarios
🔎 Similar Papers
No similar papers found.