🤖 AI Summary
This work addresses the challenges of inconsistent feature spaces, distribution shifts, and severe class imbalance in heterogeneous tabular data under continual learning settings by proposing an efficient and stable continual anomaly detection method. The approach introduces the AGF model to align task-specific features into a shared semantic space and innovatively integrates intra-task boundary-aware interpolation with cross-task hybrid augmentation (TaskFusion). Coupled with an anomaly exposure mechanism and tabular data distillation, it generates compact replay samples that effectively mitigate representation drift and catastrophic forgetting. Extensive experiments across 21 heterogeneous, cross-domain datasets demonstrate that the proposed method significantly outperforms existing baselines, achieving superior detection performance while maintaining stability and reducing forgetting.
📝 Abstract
Continual anomaly detection in tabular data is challenging and remains largely underexplored, particularly in settings with heterogeneous feature schemas, distribution shifts, and severe class imbalance. In many real-world applications, data arrive sequentially from diverse domains, rendering conventional continual learning methods ineffective due to their reliance on a fixed input space. We propose a continual learning (CL) method, which can overcome these challenges and continually learn from different tasks. Our method consists of three main parts: our AGF model, Taskfusion augmentation, and outlier exposure. The AGF-model maps task-specific features into a shared space, then aligns distributions to reduce representation drift, and learns anomaly decision boundaries in the aligned space. To improve stability, we introduce Taskfusion augmentation, combining boundary-aware interpolation within tasks to refine the model anomaly boundaries and cross-task mixing to transfer anomaly structure across datasets. To handle class imbalance and memory constraints, we employ tabular dataset distillation to store compact synthetic replay samples, which are jointly used with augmented data in an outlier exposure objective for robust anomaly detection. We evaluate the approach on 21 heterogeneous datasets across multiple domains. Results show that our approach substantially improves continual anomaly detection performance over sequential fine-tuning and other CL baselines while reducing catastrophic forgetting and maintaining stable detection across heterogeneous datasets.