🤖 AI Summary
This work proposes SCIL, a novel framework designed to address the challenges of model instability and unreliable detection in data streams caused by concept drift, class imbalance, label scarcity, and the continual emergence of new classes. SCIL is the first approach to jointly tackle these issues within the streaming class-incremental learning setting. It integrates an autoencoder with a multilayer perceptron, leveraging a dual loss combining classification and reconstruction, and trains online using refined pseudo-labels. To handle dynamically evolving class distributions, SCIL incorporates a class queue management mechanism and an oversampling strategy. Extensive experiments demonstrate that SCIL significantly outperforms strong baselines and state-of-the-art methods across multiple real-world and synthetic datasets, achieving enhanced robustness and accuracy in dynamic environments.
📝 Abstract
In today's connected world, the generation of massive streaming data across diverse domains has become commonplace. In the presence of concept drift, class imbalance, label scarcity, and new class emergence, they jointly degrade representation stability, bias learning toward outdated distributions, and reduce the resilience and reliability of detection in dynamic environments. This paper proposes SCIL (Streaming Class-Incremental Learning) to address these challenges. The SCIL framework integrates an autoencoder (AE) with a multi-layer perceptron for multi-class prediction, uses a dual-loss strategy (classification and reconstruction) for prediction and new class detection, employs corrected pseudo-labels for online training, manages classes with queues, and applies oversampling to handle imbalance. The rationale behind the method's structure is elucidated through ablation studies and a comprehensive experimental evaluation is performed using both real-world and synthetic datasets that feature class imbalance, incremental classes, and concept drifts. Our results demonstrate that SCIL outperforms strong baselines and state-of-the-art methods. Based on our commitment to Open Science, we make our code and datasets available to the community.