PatchScene: Patch-based Voxel Diffusion for Large-Scale Scene Completion

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses key challenges in large-scale LiDAR scene completion—namely, the loss of geometric detail, spatiotemporal inconsistency, and difficulty in long-range reconstruction—by proposing a diffusion-based generative framework operating on local voxel blocks. The method explicitly models fine-grained geometry within localized 3D regions and introduces two key innovations: a confidence-guided spatiotemporal fusion mechanism and an Annular-Flow diffusion strategy, enabling coherent completion in unbounded space. Evaluated on SemanticKITTI, the model achieves state-of-the-art performance, significantly outperforming existing approaches in both geometric fidelity and temporal consistency. Notably, it demonstrates strong generalization capability by extending a model trained for 20-meter scenes to accurately complete scenes up to 50 meters without any retraining.

📝 Abstract

We propose PatchScene, a novel diffusion-based framework for large-scale LiDAR scene completion. Unlike existing methods that rely on global latent representations or dense voxel grids, PatchScene adopts a patch-based voxel diffusion paradigm that explicitly generates fine-grained geometry within localized 3D regions. To ensure coherent reconstruction at both spatial and temporal scales, we introduce a confidence-guided spatio-temporal fusion mechanism that integrates overlapping patches and adjacent frames in a unified generative process. Furthermore, we design an Annular-Flow diffusion strategy that leverages the radial density pattern of LiDAR scans to progressively propagate high-fidelity information from near-range to far-range regions, enabling spatially unbounded scene completion. Extensive experiments on the SemanticKITTI benchmark demonstrate that PatchScene achieves state-of-the-art performance across all standard metrics, surpassing previous approaches in both geometric accuracy and temporal consistency. Remarkably, the model trained on 20 m LiDAR ranges generalizes effectively to 50 m scenes without retraining, highlighting its strong scalability and generalization capability for real-world autonomous driving applications.

Problem

Research questions and friction points this paper is trying to address.

scene completion

LiDAR

large-scale

temporal consistency

spatial generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

patch-based diffusion

spatio-temporal fusion

Annular-Flow