A Comparative Analysis of Influence Signals for Data Debugging

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work systematically evaluates influence-based signals—such as Self-Influence and Average Absolute Influence—for detecting training data failures (i.e., label noise and out-of-distribution samples) across image and tabular modalities, under both from-scratch training and large-model fine-tuning settings. It presents the first unified, cross-modal, and cross-architecture analysis using a consistent influence estimator (TraceIn), uncovering two fundamental failure mechanisms: missing training dynamics and influence cancellation. Experiments demonstrate that Self-Influence is robust and effective for label error detection; however, all existing influence signals fail to reliably identify anomalous samples. The study identifies critical design flaws in current influence metrics—specifically, their neglect of gradient flow evolution and local geometric structure—thereby providing theoretical insights and empirical benchmarks for trustworthy data debugging.

Technology Category

Application Category

📝 Abstract

Improving the quality of training samples is crucial for improving the reliability and performance of ML models. In this paper, we conduct a comparative evaluation of influence-based signals for debugging training data. These signals can potentially identify both mislabeled and anomalous samples from a potentially noisy training set as we build the models and hence alleviate the need for dedicated glitch detectors. Although several influence-based signals (e.g., Self-Influence, Average Absolute Influence, Marginal Influence, GD-class) have been recently proposed in the literature, there are no experimental studies for assessing their power in detecting different glitch types (e.g., mislabeled and anomalous samples) under a common influence estimator (e.g., TraceIn) for different data modalities (image and tabular), and deep learning models (trained from scratch or foundation). Through extensive experiments, we show that signals like Self-Influence effectively detect mislabeled samples, but none of the existing signals can detect anomalies. Existing signals do not take into account the training dynamics, i.e., how the samples' influence on the model changes during training, while some signals fall into influence cancellation effects, i.e., influence score is zero due to unsigned scores accumulation, resulting in misleading influence attribution.

Problem

Research questions and friction points this paper is trying to address.

Evaluating influence signals for detecting mislabeled and anomalous training samples

Comparing effectiveness of existing signals under common estimator and data types

Addressing limitations like influence cancellation and lack of training dynamics consideration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comparative evaluation of influence-based debugging signals

Self-Influence detects mislabeled samples effectively

Existing signals ignore training dynamics and cancellation effects

🔎 Similar Papers

Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis