🤖 AI Summary
Existing 4D LiDAR scene generation methods overlook the substantial differences in perceptual difficulty and uncertainty across spatial regions. To address this, this work proposes U4D, a novel framework that explicitly incorporates spatial uncertainty into the generation pipeline for the first time. Specifically, U4D constructs an uncertainty map by computing point-wise Shannon entropy from a pretrained segmentor and employs a two-stage diffusion mechanism that prioritizes synthesizing geometric structures in high-entropy regions following a “hard-to-easy” strategy. Additionally, it introduces the MoST (Mixture of Spatio-Temporal) module to dynamically fuse spatio-temporal features, enhancing temporal consistency. Experiments demonstrate that U4D significantly improves scene fidelity, temporal coherence, and downstream task performance on both nuScenes and SemanticKITTI benchmarks.
📝 Abstract
Constructing faithful 4D worlds from LiDAR-acquired sequences is crucial for embodied AI, yet current generative frameworks apply uniform modeling capacity across all spatial regions. This ignores that perceptual difficulty varies dramatically within a single scan: distant surfaces, occluded boundaries, and small-scale objects carry far higher uncertainty than well-observed structures. We present U4D, a new framework that explicitly leverages spatial uncertainty to guide LiDAR scene generation in a "hard-to-easy" schedule. U4D derives per-point uncertainty maps via Shannon Entropy from a pretrained segmentor, then applies an unconditional diffusion stage to synthesize high-entropy areas with precise geometry, followed by a conditional completion stage that fills in the remaining regions using these structures as priors. A MoST (Mixture of Spatio-Temporal) block further maintains cross-frame coherence by dynamically balancing spatial detail and temporal continuity. Extensive experiments on nuScenes and SemanticKITTI demonstrate state-of-the-art scene fidelity, temporal consistency, and downstream performance.