🤖 AI Summary
Regional heating data frequently suffer from noise, missing values, and temporal misalignment, severely degrading the performance of AI-based thermal load forecasting. To address this, we propose a human-in-the-loop visual diagnostic framework that integrates anomaly-sensitive KPIs—such as the modified z-score—with multidimensional time-series visualizations (e.g., time-series plots, heatmaps, boxplots, and correlation matrices), implemented as an extensible, reusable web-based interactive dashboard. This framework enables domain experts to efficiently identify and characterize data quality issues. Evaluated on four years of hourly real-world data from nearly 7,000 heat meters in Denmark, the framework systematically uncovers diverse data defects and significantly improves both prediction accuracy and robustness of LSTM and GRU models. Our approach establishes a novel, reliable paradigm for data quality assurance in AI-driven energy management systems.
📝 Abstract
High-quality data is a prerequisite for training reliable Artificial Intelligence (AI) models in the energy domain. In district heating networks, sensor and metering data often suffer from noise, missing values, and temporal inconsistencies, which can significantly degrade model performance. This paper presents a systematic approach for evaluating and improving data quality using visual diagnostics, implemented through an interactive web-based dashboard. The dashboard employs Python-based visualization techniques, including time series plots, heatmaps, box plots, histograms, correlation matrices, and anomaly-sensitive KPIs such as skewness and anomaly detection based on the modified z-scores. These tools al-low human experts to inspect and interpret data anomalies, enabling a human-in-the-loop strategy for data quality assessment. The methodology is demonstrated on a real-world dataset from a Danish district heating provider, covering over four years of hourly data from nearly 7000 meters. The findings show how visual analytics can uncover systemic data issues and, in the future, guide data cleaning strategies that enhance the accuracy, stability, and generalizability of Long Short-Term Memory and Gated Recurrent Unit models for heat demand forecasting. The study contributes to a scalable, generalizable framework for visual data inspection and underlines the critical role of data quality in AI-driven energy management systems.