🤖 AI Summary
Addressing the challenge of modeling concept drift in non-stationary data streams, this paper proposes the Frequency-Filtered Meta-Descriptor (FFMD), the first meta-descriptor framework to incorporate frequency-domain analysis. FFMD applies the Discrete Fourier Transform (DFT) to feature sequences, then adaptively selects discriminative frequency components based on their variance characteristics—thereby constructing dynamic, semantically interpretable meta-features. These meta-features support downstream tasks including concept clustering, evolutionary visualization, and spatial-domain reconstruction. Compared against two state-of-the-art (SOTA) methods and a PCA-based baseline, FFMD achieves statistically significant improvements in post-hoc concept identification. Moreover, on real-world streaming datasets, FFMD successfully uncovers interpretable, stage-wise concept evolution patterns. By bridging spectral analysis with meta-feature learning, FFMD establishes a novel paradigm for understanding and diagnosing non-stationary stream data.
📝 Abstract
Concept drift is among the primary challenges faced by the data stream processing methods. The drift detection strategies, designed to counteract the negative consequences of such changes, often rely on analyzing the problem metafeatures. This work presents the Frequency Filtering Metadescriptor -- a tool for characterizing the data stream that searches for the informative frequency components visible in the sample's feature vector. The frequencies are filtered according to their variance across all available data batches. The presented solution is capable of generating a metadescription of the data stream, separating chunks into groups describing specific concepts on its basis, and visualizing the frequencies in the original spatial domain. The experimental analysis compared the proposed solution with two state-of-the-art strategies and with the PCA baseline in the post-hoc concept identification task. The research is followed by the identification of concepts in the real-world data streams. The generalization in the frequency domain adapted in the proposed solution allows to capture the complex feature dependencies as a reduced number of frequency components, while maintaining the semantic meaning of data.