Diffusion-Based Generation and Imputation of Driving Scenarios from Limited Vehicle CAN Data

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of scarcity, noise, and corrupted samples in车载 CAN bus time-series data, this paper proposes a diffusion-based method for generating and repairing driving-scenario time-series data. Methodologically, we design a hybrid diffusion framework integrating autoregressive and non-autoregressive mechanisms, enhance the DDPM architecture with physics-informed constraints—including acceleration continuity, dynamical bounds, and trajectory consistency—and introduce three interpretable, physics-grounded evaluation metrics. Our key contributions are: (i) the first application of physics-augmented diffusion models to CAN time-series generation and imputation; (ii) synthetically generated samples exhibiting significantly higher physical plausibility than raw data; and (iii) effective detection and correction of anomalous segments, thereby improving dataset quality and behavioral realism. This approach provides a robust data augmentation solution for autonomous driving training under small-sample, high-noise vehicular data conditions.

Technology Category

Application Category

📝 Abstract
Training deep learning methods on small time series datasets that also include corrupted samples is challenging. Diffusion models have shown to be effective to generate realistic and synthetic data, and correct corrupted samples through imputation. In this context, this paper focuses on generating synthetic yet realistic samples of automotive time series data. We show that denoising diffusion probabilistic models (DDPMs) can effectively solve this task by applying them to a challenging vehicle CAN-dataset with long-term data and a limited number of samples. Therefore, we propose a hybrid generative approach that combines autoregressive and non-autoregressive techniques. We evaluate our approach with two recently proposed DDPM architectures for time series generation, for which we propose several improvements. To evaluate the generated samples, we propose three metrics that quantify physical correctness and test track adherence. Our best model is able to outperform even the training data in terms of physical correctness, while showing plausible driving behavior. Finally, we use our best model to successfully impute physically implausible regions in the training data, thereby improving the data quality.
Problem

Research questions and friction points this paper is trying to address.

Generate synthetic realistic automotive time series data
Impute corrupted samples in limited vehicle CAN datasets
Overcome challenges of small datasets with diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Denoising diffusion probabilistic models for data generation
Hybrid autoregressive and non-autoregressive generative approach
Three metrics evaluating physical correctness and adherence
🔎 Similar Papers
No similar papers found.
J
Julian Ripper
Telecooperation Lab at Technical University of Darmstadt, Darmstadt, Germany
O
Ousama Esbel
Compredict GmbH, Darmstadt, Germany
R
Rafael Fietzek
Compredict GmbH, Darmstadt, Germany
Max Mühlhäuser
Max Mühlhäuser
Professor of Computer Science, Technische Universität Darmstadt
Ubiquitous ComputingHCIPrivacy & TrustDist. Systems & Networks
T
Thomas Kreutz
Telecooperation Lab at Technical University of Darmstadt, Darmstadt, Germany