VideoGAN-based Trajectory Proposal for Automated Vehicles

📅 2025-06-19

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

To address the challenge of simultaneously achieving statistical accuracy and physical plausibility in multi-agent trajectory prediction for autonomous driving, this paper proposes an end-to-end trajectory generation framework based on VideoGAN. The method takes low-resolution bird’s-eye view (BEV) occupancy grid videos as input and leverages VideoGAN to model multimodal future trajectory distributions. Physical feasibility is ensured by integrating single-frame object detection with cross-frame object association to decode vehicle trajectories from the generated video. This approach overcomes limitations of conventional models in capturing complex agent interactions. Trained on the Waymo Open Motion Dataset in under 100 GPU-hours, the model achieves inference latency below 20 ms. Quantitative evaluation demonstrates significant improvements over rule-based and classical learning-based baselines—particularly in spatial relationships (e.g., relative pose) and kinematic consistency (e.g., velocity and acceleration profiles).

Technology Category

Application Category

📝 Abstract

Being able to generate realistic trajectory options is at the core of increasing the degree of automation of road vehicles. While model-driven, rule-based, and classical learning-based methods are widely used to tackle these tasks at present, they can struggle to effectively capture the complex, multimodal distributions of future trajectories. In this paper we investigate whether a generative adversarial network (GAN) trained on videos of bird's-eye view (BEV) traffic scenarios can generate statistically accurate trajectories that correctly capture spatial relationships between the agents. To this end, we propose a pipeline that uses low-resolution BEV occupancy grid videos as training data for a video generative model. From the generated videos of traffic scenarios we extract abstract trajectory data using single-frame object detection and frame-to-frame object matching. We particularly choose a GAN architecture for the fast training and inference times with respect to diffusion models. We obtain our best results within 100 GPU hours of training, with inference times under 20,ms. We demonstrate the physical realism of the proposed trajectories in terms of distribution alignment of spatial and dynamic parameters with respect to the ground truth videos from the Waymo Open Motion Dataset.

Problem

Research questions and friction points this paper is trying to address.

Generate realistic trajectory options for automated vehicles

Capture complex multimodal distributions of future trajectories

Ensure spatial relationships between agents are correctly modeled

Innovation

Methods, ideas, or system contributions that make the work stand out.

GAN for realistic trajectory generation

BEV video training data pipeline

Fast GAN over diffusion models

🔎 Similar Papers

Hybrid Video Anomaly Detection for Anomalous Scenarios in Autonomous Driving