Synthetic Non-stationary Data Streams for Recognition of the Unknown

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Open-set recognition in data streams faces dual challenges—concept drift and the concurrent emergence of unknown classes—yet existing benchmarks lack unified modeling of their dynamic coupling. Method: We propose the first synthetic data stream generation framework that jointly models concept drift and incremental unknown-class emergence. It integrates generative modeling to produce controllable, reproducible benchmark streams and couples unsupervised drift detectors for joint evaluation of known-class classification and unknown-class identification. Contributions/Results: (1) First synthetic stream generator enabling synchronized simulation of concept drift and gradual unknown-class appearance; (2) Empirical demonstration that unsupervised drift detectors inherently support both drift localization and novelty detection; (3) A standardized evaluation protocol and benchmark suite for open-set stream learning. Experiments show our framework significantly improves reproducibility, fairness, and robustness assessment of existing methods.

Technology Category

Application Category

📝 Abstract
The problem of data non-stationarity is commonly addressed in data stream processing. In a dynamic environment, methods should continuously be ready to analyze time-varying data -- hence, they should enable incremental training and respond to concept drifts. An equally important variability typical for non-stationary data stream environments is the emergence of new, previously unknown classes. Often, methods focus on one of these two phenomena -- detection of concept drifts or detection of novel classes -- while both difficulties can be observed in data streams. Additionally, concerning previously unknown observations, the topic of open set of classes has become particularly important in recent years, where the goal of methods is to efficiently classify within known classes and recognize objects outside the model competence. This article presents a strategy for synthetic data stream generation in which both concept drifts and the emergence of new classes representing unknown objects occur. The presented research shows how unsupervised drift detectors address the task of detecting novelty and concept drifts and demonstrates how the generated data streams can be utilized in the open set recognition task.
Problem

Research questions and friction points this paper is trying to address.

Addressing data non-stationarity in dynamic stream environments
Detecting concept drifts and novel classes simultaneously
Enhancing open set recognition for unknown class objects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic data streams simulate concept drifts
Unsupervised drift detectors identify novelty
Open set recognition handles unknown classes
🔎 Similar Papers
No similar papers found.