A Close Look at Decomposition-based XAI-Methods for Transformer Language Models

📅 2025-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of systematic comparative evaluation of decomposition-based XAI methods—specifically ALTI-Logit and Layer-wise Relevance Propagation (LRP)—in Transformer language models. We conduct the first quantitative and qualitative analysis of their attribution performance under a unified benchmark, focusing on subject–verb agreement, a syntax-sensitive linguistic task. Experiments span BERT, GPT-2, and LLaMA-3, and introduce AttnLRP, a novel LRP variant explicitly designed for attention mechanisms. We propose algorithmic and implementation optimizations and construct the first publicly available, manually annotated dataset dedicated to attribution evaluation in language models. Results reveal method-specific attribution biases and consistency disparities in syntactic reasoning, empirically validating AttnLRP’s superior modeling of attention dynamics. All code and data are open-sourced to support standardized, reproducible XAI evaluation for language models.

Technology Category

Application Category

📝 Abstract
Various XAI attribution methods have been recently proposed for the transformer architecture, allowing for insights into the decision-making process of large language models by assigning importance scores to input tokens and intermediate representations. One class of methods that seems very promising in this direction includes decomposition-based approaches, i.e., XAI-methods that redistribute the model's prediction logit through the network, as this value is directly related to the prediction. In the previous literature we note though that two prominent methods of this category, namely ALTI-Logit and LRP, have not yet been analyzed in juxtaposition and hence we propose to close this gap by conducting a careful quantitative evaluation w.r.t. ground truth annotations on a subject-verb agreement task, as well as various qualitative inspections, using BERT, GPT-2 and LLaMA-3 as a testbed. Along the way we compare and extend the ALTI-Logit and LRP methods, including the recently proposed AttnLRP variant, from an algorithmic and implementation perspective. We further incorporate in our benchmark two widely-used gradient-based attribution techniques. Finally, we make our carefullly constructed benchmark dataset for evaluating attributions on language models, as well as our code, publicly available in order to foster evaluation of XAI-methods on a well-defined common ground.
Problem

Research questions and friction points this paper is trying to address.

Analyze ALTI-Logit and LRP attribution methods
Evaluate XAI methods on subject-verb agreement tasks
Compare and extend ALTI-Logit, LRP, and AttnLRP techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposition-based XAI methods
Comparative analysis ALTI-Logit LRP
Public benchmark dataset code
🔎 Similar Papers
No similar papers found.
Leila Arras
Leila Arras
Research Associate, Fraunhofer HHI, BIFOLD, Berlin, Germany
Machine LearningNeural NetworksNatural Language ProcessingVisual ReasoningInterpretability
B
Bruno Puri
Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, Germany; Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany
P
Patrick Kahardipraja
Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, Germany
Sebastian Lapuschkin
Sebastian Lapuschkin
Head of Explainable AI, Fraunhofer Heinrich Hertz Institute
InterpretabilityExplainable AIXAIMachine LearningArtificial Intelligence
Wojciech Samek
Wojciech Samek
Professor at TU Berlin, Head of AI Department at Fraunhofer HHI, BIFOLD Fellow
Deep LearningInterpretabilityExplainable AITrustworthy AIFederated Learning