Diffusion-Based Generation and Imputation of Driving Scenarios from Limited Vehicle CAN Data

📅 2025-09-15

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Addressing the challenges of scarcity, noise, and corrupted samples in车载 CAN bus time-series data, this paper proposes a diffusion-based method for generating and repairing driving-scenario time-series data. Methodologically, we design a hybrid diffusion framework integrating autoregressive and non-autoregressive mechanisms, enhance the DDPM architecture with physics-informed constraints—including acceleration continuity, dynamical bounds, and trajectory consistency—and introduce three interpretable, physics-grounded evaluation metrics. Our key contributions are: (i) the first application of physics-augmented diffusion models to CAN time-series generation and imputation; (ii) synthetically generated samples exhibiting significantly higher physical plausibility than raw data; and (iii) effective detection and correction of anomalous segments, thereby improving dataset quality and behavioral realism. This approach provides a robust data augmentation solution for autonomous driving training under small-sample, high-noise vehicular data conditions.

Technology Category

Application Category

📝 Abstract

Training deep learning methods on small time series datasets that also include corrupted samples is challenging. Diffusion models have shown to be effective to generate realistic and synthetic data, and correct corrupted samples through imputation. In this context, this paper focuses on generating synthetic yet realistic samples of automotive time series data. We show that denoising diffusion probabilistic models (DDPMs) can effectively solve this task by applying them to a challenging vehicle CAN-dataset with long-term data and a limited number of samples. Therefore, we propose a hybrid generative approach that combines autoregressive and non-autoregressive techniques. We evaluate our approach with two recently proposed DDPM architectures for time series generation, for which we propose several improvements. To evaluate the generated samples, we propose three metrics that quantify physical correctness and test track adherence. Our best model is able to outperform even the training data in terms of physical correctness, while showing plausible driving behavior. Finally, we use our best model to successfully impute physically implausible regions in the training data, thereby improving the data quality.

Problem

Research questions and friction points this paper is trying to address.

Generate synthetic realistic automotive time series data

Impute corrupted samples in limited vehicle CAN datasets

Overcome challenges of small datasets with diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Denoising diffusion probabilistic models for data generation

Hybrid autoregressive and non-autoregressive generative approach

Three metrics evaluating physical correctness and adherence

🔎 Similar Papers

No similar papers found.