🤖 AI Summary
Addressing the challenges of complex feature interactions and scarce anomalous samples in tabular anomaly detection, this paper proposes DiffCL—a denoising autoencoder framework that integrates diffusion-model noise scheduling with contrastive learning. DiffCL embeds temporally controlled diffusion-style noise injection into an encoder-decoder architecture and employs contrastive learning to enhance the representational separability between normal and anomalous samples, thereby improving discriminative capability under both semi-supervised and unsupervised settings. Extensive experiments across 57 benchmark datasets demonstrate that DiffCL achieves substantial gains: +9 percentage points (65% relative improvement) in PR-AUC and +6 percentage points (16% relative improvement) in ROC-AUC under semi-supervised evaluation, significantly outperforming state-of-the-art autoencoder- and diffusion-based baselines. These results validate DiffCL’s strong robustness and generalization across diverse anomaly distributions.
📝 Abstract
Anomaly detection in tabular data remains challenging due to complex feature interactions and the scarcity of anomalous examples. Denoising autoencoders rely on fixed-magnitude noise, limiting adaptability to diverse data distributions. Diffusion models introduce scheduled noise and iterative denoising, but lack explicit reconstruction mappings. We propose the Diffusion-Scheduled Denoising Autoencoder (DDAE), a framework that integrates diffusion-based noise scheduling and contrastive learning into the encoding process to improve anomaly detection. We evaluated DDAE on 57 datasets from ADBench. Our method outperforms in semi-supervised settings and achieves competitive results in unsupervised settings, improving PR-AUC by up to 65% (9%) and ROC-AUC by 16% (6%) over state-of-the-art autoencoder (diffusion) model baselines. We observed that higher noise levels benefit unsupervised training, while lower noise with linear scheduling is optimal in semi-supervised settings. These findings underscore the importance of principled noise strategies in tabular anomaly detection.