Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the semi-supervised anomaly detection (AD) problem, focusing on the critical challenge of effectively integrating genuine anomalies with synthetically generated anomalies—under unsupervised or semi-supervised settings—to enhance model performance. We propose the first formal mathematical definition for this integration and develop a unified training framework that jointly models known genuine anomalies and controllable synthetic anomalies within a classifier-based AD paradigm. Theoretically, we prove that synthetic anomalies improve modeling of low-density regions and establish, for the first time, optimal convergence guarantees for neural network classifiers in this context. Our method integrates classification-based AD, controllable anomaly generation, and generalization error analysis. Empirical evaluation across five benchmark datasets demonstrates consistent and significant performance gains. Moreover, the proposed synthetic anomaly mechanism is broadly applicable and transferable to other classification-based AD methods.

Technology Category

Application Category

📝 Abstract
Anomaly detection (AD) is a critical task across domains such as cybersecurity and healthcare. In the unsupervised setting, an effective and theoretically-grounded principle is to train classifiers to distinguish normal data from (synthetic) anomalies. We extend this principle to semi-supervised AD, where training data also include a limited labeled subset of anomalies possibly present in test time. We propose a theoretically-grounded and empirically effective framework for semi-supervised AD that combines known and synthetic anomalies during training. To analyze semi-supervised AD, we introduce the first mathematical formulation of semi-supervised AD, which generalizes unsupervised AD. Here, we show that synthetic anomalies enable (i) better anomaly modeling in low-density regions and (ii) optimal convergence guarantees for neural network classifiers -- the first theoretical result for semi-supervised AD. We empirically validate our framework on five diverse benchmarks, observing consistent performance gains. These improvements also extend beyond our theoretical framework to other classification-based AD methods, validating the generalizability of the synthetic anomaly principle in AD.
Problem

Research questions and friction points this paper is trying to address.

Extends unsupervised anomaly detection to semi-supervised setting
Proposes framework combining known and synthetic anomalies
Provides theoretical guarantees for neural network classifiers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines known and synthetic anomalies for training
Introduces first mathematical semi-supervised AD formulation
Uses synthetic anomalies for optimal convergence guarantees
🔎 Similar Papers
No similar papers found.
Matthew Lau
Matthew Lau
School of Cybersecurity and Privacy, Georgia Institute of Technology
T
Tian-Yi Zhou
H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology
Xiangchi Yuan
Xiangchi Yuan
Georgia Institute of Technology
Representation Learning
J
Jizhou Chen
School of Cybersecurity and Privacy, Georgia Institute of Technology
W
Wenke Lee
School of Cybersecurity and Privacy, Georgia Institute of Technology
Xiaoming Huo
Xiaoming Huo
Professor, Georgia Institute of Technology
statisticsdata sciencemachine learning