On the reliability of feature attribution methods for speech classification

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study systematically evaluates the reliability of feature attribution methods in speech classification tasks, revealing their significant sensitivity to temporal structure, perturbation strategies, and aggregation time scales. We identify, for the first time, the poor robustness and instability of mainstream attribution methods—including Grad-CAM and Integrated Gradients—in speech domains. To address this, we propose a word-aligned perturbation strategy that leverages speech-text alignment to perform fine-grained, semantically consistent perturbations, substantially improving attribution faithfulness for word-level classification. Our evaluation employs a multi-scale perturbation baseline (frame-level and word-level masking), diverse attribution aggregation schemes, and a cross-task reliability measurement framework. Experiments demonstrate that the proposed strategy improves attribution consistency by up to 42%. This work establishes the first reliable, temporally structured, and semantics-aware attribution paradigm for explainable AI in speech processing.

Technology Category

Application Category

📝 Abstract

As the capabilities of large-scale pre-trained models evolve, understanding the determinants of their outputs becomes more important. Feature attribution aims to reveal which parts of the input elements contribute the most to model outputs. In speech processing, the unique characteristics of the input signal make the application of feature attribution methods challenging. We study how factors such as input type and aggregation and perturbation timespan impact the reliability of standard feature attribution methods, and how these factors interact with characteristics of each classification task. We find that standard approaches to feature attribution are generally unreliable when applied to the speech domain, with the exception of word-aligned perturbation methods when applied to word-based classification tasks.

Problem

Research questions and friction points this paper is trying to address.

Evaluating reliability of feature attribution in speech classification

Assessing impact of input type on attribution method performance

Identifying reliable word-aligned perturbation for word-based tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Study impact of input type on attribution reliability

Evaluate aggregation and perturbation timespan effects

Find word-aligned perturbation reliable for word tasks

🔎 Similar Papers

Can Authorship Attribution Models Distinguish Speakers in Speech Transcripts?