Anomaly Detection with Adaptive and Aggressive Rejection for Contaminated Training Data

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

In anomaly detection, training data are often contaminated by anomalous samples, yet conventional methods rely on a predefined contamination rate—rendering them ill-suited to real-world scenarios where the contamination level is both unknown and variable, especially under significant overlap between normal and anomalous distributions. To address this, we propose an Adaptive Aggressive Anomaly Rejection (AAAR) framework that synergistically integrates hard rejection—via dynamically refined z-score thresholds—with soft rejection—based on probabilistic modeling using Gaussian Mixture Models—and employs an adaptive threshold learning mechanism for precise identification and removal of contaminated samples. Crucially, AAAR requires no prior knowledge of contamination rate. Evaluated on 2 image and 30 tabular benchmark datasets, it achieves an average AUROC improvement of 0.041 over state-of-the-art baselines, significantly enhancing model robustness, detection accuracy, and cross-domain generalization capability.

Technology Category

Application Category

📝 Abstract

Handling contaminated data poses a critical challenge in anomaly detection, as traditional models assume training on purely normal data. Conventional methods mitigate contamination by relying on fixed contamination ratios, but discrepancies between assumed and actual ratios can severely degrade performance, especially in noisy environments where normal and abnormal data distributions overlap. To address these limitations, we propose Adaptive and Aggressive Rejection (AAR), a novel method that dynamically excludes anomalies using a modified z-score and Gaussian mixture model-based thresholds. AAR effectively balances the trade-off between preserving normal data and excluding anomalies by integrating hard and soft rejection strategies. Extensive experiments on two image datasets and thirty tabular datasets demonstrate that AAR outperforms the state-of-the-art method by 0.041 AUROC. By providing a scalable and reliable solution, AAR enhances robustness against contaminated datasets, paving the way for broader real-world applications in domains such as security and healthcare.

Problem

Research questions and friction points this paper is trying to address.

Detecting anomalies when training data contains contaminated samples

Overcoming performance degradation from mismatched contamination ratio assumptions

Addressing overlapping distributions in noisy anomaly detection environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic anomaly exclusion using modified z-score

Gaussian mixture model-based threshold integration

Hard and soft rejection strategy balancing

🔎 Similar Papers

No similar papers found.