EditSSC: Toward Editable Semantic Occupancy Scenes with Unconditional Diffusion Models

📅 2026-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for 3D semantic scene generation rely on complex, task-specific architectures that struggle to balance simplicity with editing flexibility. This work proposes a novel approach that converts 3D semantic occupancy grids into multi-channel bird’s-eye-view (BEV) representations, enabling direct use of off-the-shelf 2D latent diffusion models—such as Stable Diffusion—for unconditional generation and editing without any additional training. Leveraging a quantized autoencoder, a UNet architecture, and a codebook that explicitly maps categories to discrete codes, the method supports diverse editing operations including sketch-guided synthesis, inpainting, and outpainting. Evaluated on the SemanticKITTI dataset, the proposed approach achieves superior unconditional generation performance compared to current 3D-specialized baselines.
📝 Abstract
3D semantic scene generation is crucial for autonomous driving applications, yet most methods rely on complex 3D-specific architectures such as triplane encoders and adapted diffusion networks, limiting both their simplicity and their editing capabilities. We propose EditSSC, an editing-ready method for 3D semantic scene generation using 2D Bird's Eye View (BEV) representations and off-the-shelf latent diffusion network. Our approach reshapes 3D semantic occupancy grids into multi-channel BEV images and leverages the quantized autoencoder and UNet from Stable Diffusion with minimal modifications. We perform diffusion on the latents after quantization, which enables training-free editing capabilities. By exploiting class-to-code correspondences in the codebook, our method supports sketch-guided generation, inpainting, and outpainting without any retraining. On SemanticKITTI, EditSSC outperforms existing 3D-specific baselines on unconditional generation, demonstrating that well-established 2D architectures can be effectively repurposed for 3D scene generation and editing.
Problem

Research questions and friction points this paper is trying to address.

3D semantic scene generation
editing capability
autonomous driving
diffusion models
semantic occupancy
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic occupancy
unconditional diffusion
BEV representation
training-free editing
latent diffusion
🔎 Similar Papers