🤖 AI Summary
This work addresses the challenge of noise and anomalies in multivariate time series from critical infrastructure such as power systems, which degrade downstream task performance. It proposes the first unified unsupervised probabilistic framework that jointly performs anomaly detection and missing value imputation. The method leverages conditional normalizing flows to model the conditional likelihood of observed data, identifying low-probability segments as anomalies and iteratively sampling statistically consistent replacements while preserving both physical and statistical properties of the system. Experiments on real-world Norwegian power grid loss data demonstrate that the approach outperforms existing baselines in robustness and scalability, and effectively quantifies predictive uncertainty.
📝 Abstract
Real-world multivariate time series, particularly in critical infrastructure such as electrical power grids, are often corrupted by noise and anomalies that degrade the performance of downstream tasks. Standard data cleaning approaches often rely on disjoint strategies, which involve detecting errors with one model and imputing them with another. Such approaches can fail to capture the full joint distribution of the data and ignore prediction uncertainty. This work introduces Conditional Imputation and Noisy Data Integrity (CINDI), an unsupervised probabilistic framework designed to restore data integrity in complex time series. Unlike fragmented approaches, CINDI unifies anomaly detection and imputation into a single end-to-end system built on conditional normalizing flows. By modeling the exact conditional likelihood of the data, the framework identifies low-probability segments and iteratively samples statistically consistent replacements. This allows CINDI to efficiently reuse learned information while preserving the underlying physical and statistical properties of the system. We evaluate the framework using real-world grid loss data from a Norwegian power distribution operator, though the methodology is designed to generalize to any multivariate time series domain. The results demonstrate that CINDI yields robust performance compared to competitive baselines, offering a scalable solution for maintaining reliability in noisy environments.