🤖 AI Summary
This work addresses statistical decomposition and missing-value imputation for multi-scale energy consumption data under linear equality constraints—e.g., daily totals must equal the sum of hourly readings—where high-resolution variables (e.g., hourly loads) are inferred from low-resolution observations (e.g., daily aggregates). Such constraints induce degenerate, zero-measure manifolds, rendering standard Monte Carlo methods (e.g., rejection sampling) inapplicable. To overcome this, we propose the first exact constrained Monte Carlo algorithm that integrates Langevin diffusion with an enhanced rejection-sampling scheme, enabling unbiased, efficient sampling under general linear constraints while faithfully capturing multimodal posterior distributions. Experiments on real-world electricity consumption data demonstrate that our method significantly improves imputation accuracy and uncertainty quantification over both unconstrained and naive baseline approaches.
📝 Abstract
Equality-constrained models naturally arise in problems in which measurements are taken at different levels of resolution. The challenge in this setting is that the models usually induce a joint distribution which is intractable. Resorting to instead sampling from the joint distribution by means of a Monte Carlo approach is also challenging. For example, a naive rejection sampling does not work when the probability mass of the constraint is zero. A typical example of such constrained problems is to learn energy consumption for a higher resolution level based on data at a lower resolution, e.g., to decompose a daily reading into readings at a finer level. We introduce a novel Monte Carlo sampling algorithm based on Langevin diffusions and rejection sampling to solve the problem of sampling from equality-constrained models. Our method has the advantage of being exact for linear constraints and naturally deals with multimodal distributions on arbitrary constraints. We test our method on statistical disaggregation problems for electricity consumption datasets, and our approach provides better uncertainty estimation and accuracy in data imputation compared with other naive/unconstrained methods.