Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the semi-supervised anomaly detection (AD) problem, focusing on the critical challenge of effectively integrating genuine anomalies with synthetically generated anomalies—under unsupervised or semi-supervised settings—to enhance model performance. We propose the first formal mathematical definition for this integration and develop a unified training framework that jointly models known genuine anomalies and controllable synthetic anomalies within a classifier-based AD paradigm. Theoretically, we prove that synthetic anomalies improve modeling of low-density regions and establish, for the first time, optimal convergence guarantees for neural network classifiers in this context. Our method integrates classification-based AD, controllable anomaly generation, and generalization error analysis. Empirical evaluation across five benchmark datasets demonstrates consistent and significant performance gains. Moreover, the proposed synthetic anomaly mechanism is broadly applicable and transferable to other classification-based AD methods.

Technology Category

Application Category

📝 Abstract

Anomaly detection (AD) is a critical task across domains such as cybersecurity and healthcare. In the unsupervised setting, an effective and theoretically-grounded principle is to train classifiers to distinguish normal data from (synthetic) anomalies. We extend this principle to semi-supervised AD, where training data also include a limited labeled subset of anomalies possibly present in test time. We propose a theoretically-grounded and empirically effective framework for semi-supervised AD that combines known and synthetic anomalies during training. To analyze semi-supervised AD, we introduce the first mathematical formulation of semi-supervised AD, which generalizes unsupervised AD. Here, we show that synthetic anomalies enable (i) better anomaly modeling in low-density regions and (ii) optimal convergence guarantees for neural network classifiers -- the first theoretical result for semi-supervised AD. We empirically validate our framework on five diverse benchmarks, observing consistent performance gains. These improvements also extend beyond our theoretical framework to other classification-based AD methods, validating the generalizability of the synthetic anomaly principle in AD.

Problem

Research questions and friction points this paper is trying to address.

Extends unsupervised anomaly detection to semi-supervised setting

Proposes framework combining known and synthetic anomalies

Provides theoretical guarantees for neural network classifiers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines known and synthetic anomalies for training

Introduces first mathematical semi-supervised AD formulation

Uses synthetic anomalies for optimal convergence guarantees

🔎 Similar Papers

No similar papers found.

Authors to Follow