π€ AI Summary
To address fine-grained semantic and geometric ambiguity in ego-centric visual 3D Semantic Scene Completion (SSC) for autonomous driving, this paper proposes Oceanβthe first object-centric framework for SSC. Methodologically, Ocean decomposes the scene into instance-level units and introduces two novel attention mechanisms: 3D semantic grouping attention and global similarity-guided attention. It further designs an instance-aware local diffusion module that jointly integrates MobileSAM segmentation priors, linear attention, and BEV-aligned generative feature diffusion. Evaluated on SemanticKITTI and SSCBench-KITTI360, Ocean achieves state-of-the-art mIoU scores of 17.40 and 20.28, respectively. Its core contributions lie in (1) shifting SSC modeling from ego-centric to object-centric representation, (2) enabling instance-aware semantic-geometric disentanglement via hierarchical attention, and (3) introducing a lightweight, diffusion-based local refinement mechanism grounded in vision-language segmentation priors and BEV feature synthesis.
π Abstract
Vision-based 3D Semantic Scene Completion (SSC) has received growing attention due to its potential in autonomous driving. While most existing approaches follow an ego-centric paradigm by aggregating and diffusing features over the entire scene, they often overlook fine-grained object-level details, leading to semantic and geometric ambiguities, especially in complex environments. To address this limitation, we propose Ocean, an object-centric prediction framework that decomposes the scene into individual object instances to enable more accurate semantic occupancy prediction. Specifically, we first employ a lightweight segmentation model, MobileSAM, to extract instance masks from the input image. Then, we introduce a 3D Semantic Group Attention module that leverages linear attention to aggregate object-centric features in 3D space. To handle segmentation errors and missing instances, we further design a Global Similarity-Guided Attention module that leverages segmentation features for global interaction. Finally, we propose an Instance-aware Local Diffusion module that improves instance features through a generative process and subsequently refines the scene representation in the BEV space. Extensive experiments on the SemanticKITTI and SSCBench-KITTI360 benchmarks demonstrate that Ocean achieves state-of-the-art performance, with mIoU scores of 17.40 and 20.28, respectively.