🤖 AI Summary
Existing LiDAR point cloud generation methods struggle to simultaneously achieve precise foreground object localization and photorealistic background rendering, while lacking temporal modeling capabilities—limiting their utility in autonomous driving training and evaluation. This paper introduces LiDAR4DNet, the first end-to-end sequential LiDAR scene generation framework, which unifies foreground-controllable editing and background high-fidelity reconstruction via multimodal conditional inputs and a novel sequential noise prediction mechanism. LiDAR4DNet is the first method to enable full-scene, spatiotemporally consistent dynamic LiDAR point cloud generation. On nuScenes, it achieves a Foreground Reconstruction Distance (FRD) of 743.13 and a Foreground Video Distance (FVD) of 16.96—outperforming the state-of-the-art UniScene by 37.2% and 24.1%, respectively. These results significantly advance the practicality and controllability of LiDAR simulation generation for real-world autonomous driving applications.
📝 Abstract
The generation of realistic LiDAR point clouds plays a crucial role in the development and evaluation of autonomous driving systems. Although recent methods for 3D LiDAR point cloud generation have shown significant improvements, they still face notable limitations, including the lack of sequential generation capabilities and the inability to produce accurately positioned foreground objects and realistic backgrounds. These shortcomings hinder their practical applicability. In this paper, we introduce DriveLiDAR4D, a novel LiDAR generation pipeline consisting of multimodal conditions and a novel sequential noise prediction model LiDAR4DNet, capable of producing temporally consistent LiDAR scenes with highly controllable foreground objects and realistic backgrounds. To the best of our knowledge, this is the first work to address the sequential generation of LiDAR scenes with full scene manipulation capability in an end-to-end manner. We evaluated DriveLiDAR4D on the nuScenes and KITTI datasets, where we achieved an FRD score of 743.13 and an FVD score of 16.96 on the nuScenes dataset, surpassing the current state-of-the-art (SOTA) method, UniScene, with an performance boost of 37.2% in FRD and 24.1% in FVD, respectively.