🤖 AI Summary
Deep learning models’ increasing complexity and inherent “black-box” nature severely hinder their trustworthy deployment in high-stakes applications. To address this, we propose a model-agnostic backward tracing attribution method that introduces the first unified reverse attribution framework—requiring no gradients and making no assumptions about internal architecture, thus supporting both PyTorch/TensorFlow models and black-box deployments. Notably, it is the first method enabling cross-layer decision tracing for large language models (LLMs) and multimodal models. Our approach integrates perturbation-based sensitivity analysis with path importance reweighting, and unifies SHAP, LIME, and GradCAM as foundational baselines. Extensive evaluation across image, text, and tabular tasks demonstrates an average 23% improvement in attribution fidelity over state-of-the-art baselines, significantly enhancing both interpretability and human comprehensibility. The method is open-sourced and validated on BERT, ResNet, U-Net, and custom DNNs.
📝 Abstract
The rapid growth of AI has led to more complex deep learning models, often operating as opaque"black boxes"with limited transparency in their decision-making. This lack of interpretability poses challenges, especially in high-stakes applications where understanding model output is crucial. This work highlights the importance of interpretability in fostering trust, accountability, and responsible deployment. To address these challenges, we introduce DLBacktrace, a novel, model-agnostic technique designed to provide clear insights into deep learning model decisions across a wide range of domains and architectures, including MLPs, CNNs, and Transformer-based LLM models. We present a comprehensive overview of DLBacktrace and benchmark its performance against established interpretability methods such as SHAP, LIME, and GradCAM. Our results demonstrate that DLBacktrace effectively enhances understanding of model behavior across diverse tasks. DLBacktrace is compatible with models developed in both PyTorch and TensorFlow, supporting architectures such as BERT, ResNet, U-Net, and custom DNNs for tabular data. The library is open-sourced and available at https://github.com/AryaXAI/DLBacktrace .