DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models

📅 2024-11-19

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Deep learning models’ increasing complexity and inherent “black-box” nature severely hinder their trustworthy deployment in high-stakes applications. To address this, we propose a model-agnostic backward tracing attribution method that introduces the first unified reverse attribution framework—requiring no gradients and making no assumptions about internal architecture, thus supporting both PyTorch/TensorFlow models and black-box deployments. Notably, it is the first method enabling cross-layer decision tracing for large language models (LLMs) and multimodal models. Our approach integrates perturbation-based sensitivity analysis with path importance reweighting, and unifies SHAP, LIME, and GradCAM as foundational baselines. Extensive evaluation across image, text, and tabular tasks demonstrates an average 23% improvement in attribution fidelity over state-of-the-art baselines, significantly enhancing both interpretability and human comprehensibility. The method is open-sourced and validated on BERT, ResNet, U-Net, and custom DNNs.

Technology Category

Application Category

📝 Abstract

The rapid growth of AI has led to more complex deep learning models, often operating as opaque"black boxes"with limited transparency in their decision-making. This lack of interpretability poses challenges, especially in high-stakes applications where understanding model output is crucial. This work highlights the importance of interpretability in fostering trust, accountability, and responsible deployment. To address these challenges, we introduce DLBacktrace, a novel, model-agnostic technique designed to provide clear insights into deep learning model decisions across a wide range of domains and architectures, including MLPs, CNNs, and Transformer-based LLM models. We present a comprehensive overview of DLBacktrace and benchmark its performance against established interpretability methods such as SHAP, LIME, and GradCAM. Our results demonstrate that DLBacktrace effectively enhances understanding of model behavior across diverse tasks. DLBacktrace is compatible with models developed in both PyTorch and TensorFlow, supporting architectures such as BERT, ResNet, U-Net, and custom DNNs for tabular data. The library is open-sourced and available at https://github.com/AryaXAI/DLBacktrace .

Problem

Research questions and friction points this paper is trying to address.

Enhances interpretability of complex deep learning models

Provides model-agnostic insights across various architectures

Supports diverse frameworks like PyTorch and TensorFlow

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-agnostic explainability technique

Compatible with various DNN architectures

Open-sourced library for deep learning

🔎 Similar Papers

No similar papers found.