🤖 AI Summary
Existing time-series generation methods struggle to simultaneously satisfy hard constraints (e.g., power peak limits), ensure sample fidelity, and scale efficiently—particularly in engineering and safety-critical applications. To address this, we propose the Diffusion Posterior Projection Sampling (DPPS) framework: a plug-and-play, training-free approach that couples the posterior mean estimate of a pre-trained diffusion model with orthogonal projection onto the constraint set, enabling an iterative denoising–projection update scheme. We theoretically establish its convergence and, for the first time, achieve efficient, high-fidelity generation under hundreds of hard constraints. Evaluated on stock, traffic, and air quality datasets, DPPS improves sample quality by ~10% and temporal similarity by ~42% over state-of-the-art methods, while significantly enhancing robustness under stress testing and utility of privacy-preserving synthetic data.
📝 Abstract
Generating realistic time series samples is crucial for stress-testing models and protecting user privacy by using synthetic data. In engineering and safety-critical applications, these samples must meet certain hard constraints that are domain-specific or naturally imposed by physics or nature. Consider, for example, generating electricity demand patterns with constraints on peak demand times. This can be used to stress-test the functioning of power grids during adverse weather conditions. Existing approaches for generating constrained time series are either not scalable or degrade sample quality. To address these challenges, we introduce Constrained Posterior Sampling (CPS), a diffusion-based sampling algorithm that aims to project the posterior mean estimate into the constraint set after each denoising update. Notably, CPS scales to a large number of constraints (~100) without requiring additional training. We provide theoretical justifications highlighting the impact of our projection step on sampling. Empirically, CPS outperforms state-of-the-art methods in sample quality and similarity to real time series by around 10% and 42%, respectively, on real-world stocks, traffic, and air quality datasets.