π€ AI Summary
This work addresses the limitation of the conventional TabDDPM model, which assumes sample independence and thus struggles to capture temporal dependencies in time series data. The study proposes a novel extension of TabDDPM tailored for time series generation, introducing a lightweight temporal adapter and a context-aware embedding module. These components explicitly model temporal context through a sliding window mechanism, timestep embeddings, conditional activity labels, and missingness masks, thereby preserving the original modelβs strengths while effectively capturing dynamic dependencies. Evaluated on the WISDM accelerometer dataset, the approach generates sequences that closely replicate real sensor patterns, achieving a classification accuracy of 0.71 and a macro F1-score of 0.64, while substantially improving representation of minority classes and alignment with the true statistical distribution.
π Abstract
Diffusion models are increasingly being utilised to create synthetic tabular and time series data for privacy-preserving augmentation. Tabular Denoising Diffusion Probabilistic Models (TabDDPM) generate high-quality synthetic data from heterogeneous tabular datasets but assume independence between samples, limiting their applicability to time-series domains where temporal dependencies are critical. To address this, we propose a temporal extension of TabDDPM, introducing sequence awareness through the use of lightweight temporal adapters and context-aware embedding modules. By reformulating sensor data into windowed sequences and explicitly modeling temporal context via timestep embeddings, conditional activity labels, and observed/missing masks, our approach enables the generation of temporally coherent synthetic sequences. Compared to baseline and interpolation techniques, validation using bigram transition matrices and autocorrelation analysis shows enhanced temporal realism, diversity, and coherence. On the WISDM accelerometer dataset, the suggested system produces synthetic time-series that closely resemble real world sensor patterns and achieves comparable classification performance (macro F1-score 0.64, accuracy 0.71). This is especially advantageous for minority class representation and preserving statistical alignment with real distributions. These developments demonstrate that diffusion based models provide effective and adaptable solutions for sequential data synthesis when they are equipped for temporal reasoning. Future work will explore scaling to longer sequences and integrating stronger temporal architectures.