TaskFusion: Continual Anomaly Detection for Heterogeneous Tabular Data

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenges of inconsistent feature spaces, distribution shifts, and severe class imbalance in heterogeneous tabular data under continual learning settings by proposing an efficient and stable continual anomaly detection method. The approach introduces the AGF model to align task-specific features into a shared semantic space and innovatively integrates intra-task boundary-aware interpolation with cross-task hybrid augmentation (TaskFusion). Coupled with an anomaly exposure mechanism and tabular data distillation, it generates compact replay samples that effectively mitigate representation drift and catastrophic forgetting. Extensive experiments across 21 heterogeneous, cross-domain datasets demonstrate that the proposed method significantly outperforms existing baselines, achieving superior detection performance while maintaining stability and reducing forgetting.

📝 Abstract

Continual anomaly detection in tabular data is challenging and remains largely underexplored, particularly in settings with heterogeneous feature schemas, distribution shifts, and severe class imbalance. In many real-world applications, data arrive sequentially from diverse domains, rendering conventional continual learning methods ineffective due to their reliance on a fixed input space. We propose a continual learning (CL) method, which can overcome these challenges and continually learn from different tasks. Our method consists of three main parts: our AGF model, Taskfusion augmentation, and outlier exposure. The AGF-model maps task-specific features into a shared space, then aligns distributions to reduce representation drift, and learns anomaly decision boundaries in the aligned space. To improve stability, we introduce Taskfusion augmentation, combining boundary-aware interpolation within tasks to refine the model anomaly boundaries and cross-task mixing to transfer anomaly structure across datasets. To handle class imbalance and memory constraints, we employ tabular dataset distillation to store compact synthetic replay samples, which are jointly used with augmented data in an outlier exposure objective for robust anomaly detection. We evaluate the approach on 21 heterogeneous datasets across multiple domains. Results show that our approach substantially improves continual anomaly detection performance over sequential fine-tuning and other CL baselines while reducing catastrophic forgetting and maintaining stable detection across heterogeneous datasets.

Problem

Research questions and friction points this paper is trying to address.

continual anomaly detection

heterogeneous tabular data

distribution shift

class imbalance

feature schema heterogeneity

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual anomaly detection

heterogeneous tabular data

feature alignment