🤖 AI Summary
Existing generative methods for irregularly sampled time series with missing values suffer from significant modeling bias, high computational overhead, and distorted neighborhood structures. This paper proposes a two-stage generative framework: first, a time-series Transformer performs structure-preserving imputation on the raw irregular sequence to construct natural spatiotemporal neighborhoods; second, the imputed sequence is converted into image-like representations and processed by a masked visual diffusion model (e.g., ImagenTime), reducing reliance on imputation accuracy while enhancing robustness. Crucially, this design achieves the first decoupled yet synergistic optimization of imputation guidance and generative modeling. Experiments demonstrate state-of-the-art performance across all key metrics: 70% improvement in generation quality (measured by discriminative score), 85% reduction in computational cost, and superior robustness under varying missingness patterns and sampling irregularity.
📝 Abstract
Generating realistic time series data is critical for applications in healthcare, finance, and science. However, irregular sampling and missing values present significant challenges. While prior methods address these irregularities, they often yield suboptimal results and incur high computational costs. Recent advances in regular time series generation, such as the diffusion-based ImagenTime model, demonstrate strong, fast, and scalable generative capabilities by transforming time series into image representations, making them a promising solution. However, extending ImagenTime to irregular sequences using simple masking introduces "unnatural" neighborhoods, where missing values replaced by zeros disrupt the learning process. To overcome this, we propose a novel two-step framework: first, a Time Series Transformer completes irregular sequences, creating natural neighborhoods; second, a vision-based diffusion model with masking minimizes dependence on the completed values. This approach leverages the strengths of both completion and masking, enabling robust and efficient generation of realistic time series. Our method achieves state-of-the-art performance, achieving a relative improvement in discriminative score by $70%$ and in computational cost by $85%$. Code is at https://github.com/azencot-group/ImagenI2R.