Data Cleaning of Data Streams

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional static data cleaning methods exhibit inconsistency and poor adaptability when applied to dynamic data streams—such as continuous temporal sensor measurements (e.g., temperature, illumination)—posing significant challenges for real-time error detection and correction. Method: This work identifies the inherent non-stationarity of stream cleaning and establishes the first theoretical framework specifically designed for streaming environments, formally characterizing its unique challenges: timeliness constraints, state evolution, and incremental validation. Based on this framework, we design a prototype system and conduct empirical evaluations across diverse streaming scenarios. Contribution/Results: Our analysis reveals substantial performance volatility in existing approaches, pinpointing three critical factors: windowing strategy, error propagation patterns, and data arrival rate. The framework provides a scalable theoretical foundation and a rigorous evaluation methodology for stream data cleaning, enabling principled design and comparative assessment of future techniques.

Technology Category

Application Category

📝 Abstract
Streaming data can arise from a variety of contexts. Important use cases are continuous sensor measurements such as temperature, light or radiation values. In the process, streaming data may also contain data errors that should be cleaned before further use. Many studies from science and practice focus on data cleaning in a static context. However, in terms of data cleaning, streaming data has particularities that distinguish it from static data. In this paper, we have therefore undertaken an intensive exploration of data cleaning of data streams. We provide a detailed analysis of the applicability of data cleaning to data streams. Our theoretical considerations are evaluated in comprehensive experiments. Using a prototype framework, we show that cleaning is not consistent when working with data streams. An additional contribution is the investigation of requirements for streaming technologies in context of data cleaning.
Problem

Research questions and friction points this paper is trying to address.

Addressing data cleaning challenges in streaming data
Analyzing applicability of cleaning methods for data streams
Investigating requirements for streaming technologies in cleaning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explores data cleaning for streaming data
Evaluates cleaning methods via prototype framework
Investigates requirements for streaming technologies
🔎 Similar Papers
No similar papers found.
V
Valerie Restat
University of Hagen
N
Niklas Rodenhausen
University of Hagen
C
Carina Antonin
University of Hagen
Uta Störl
Uta Störl
Professor of Computer Science, University of Hagen
Database SystemsNoSQLBig Data