Towards 3D Object-Centric Feature Learning for Semantic Scene Completion

πŸ“… 2025-11-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address fine-grained semantic and geometric ambiguity in ego-centric visual 3D Semantic Scene Completion (SSC) for autonomous driving, this paper proposes Oceanβ€”the first object-centric framework for SSC. Methodologically, Ocean decomposes the scene into instance-level units and introduces two novel attention mechanisms: 3D semantic grouping attention and global similarity-guided attention. It further designs an instance-aware local diffusion module that jointly integrates MobileSAM segmentation priors, linear attention, and BEV-aligned generative feature diffusion. Evaluated on SemanticKITTI and SSCBench-KITTI360, Ocean achieves state-of-the-art mIoU scores of 17.40 and 20.28, respectively. Its core contributions lie in (1) shifting SSC modeling from ego-centric to object-centric representation, (2) enabling instance-aware semantic-geometric disentanglement via hierarchical attention, and (3) introducing a lightweight, diffusion-based local refinement mechanism grounded in vision-language segmentation priors and BEV feature synthesis.

Technology Category

Application Category

πŸ“ Abstract
Vision-based 3D Semantic Scene Completion (SSC) has received growing attention due to its potential in autonomous driving. While most existing approaches follow an ego-centric paradigm by aggregating and diffusing features over the entire scene, they often overlook fine-grained object-level details, leading to semantic and geometric ambiguities, especially in complex environments. To address this limitation, we propose Ocean, an object-centric prediction framework that decomposes the scene into individual object instances to enable more accurate semantic occupancy prediction. Specifically, we first employ a lightweight segmentation model, MobileSAM, to extract instance masks from the input image. Then, we introduce a 3D Semantic Group Attention module that leverages linear attention to aggregate object-centric features in 3D space. To handle segmentation errors and missing instances, we further design a Global Similarity-Guided Attention module that leverages segmentation features for global interaction. Finally, we propose an Instance-aware Local Diffusion module that improves instance features through a generative process and subsequently refines the scene representation in the BEV space. Extensive experiments on the SemanticKITTI and SSCBench-KITTI360 benchmarks demonstrate that Ocean achieves state-of-the-art performance, with mIoU scores of 17.40 and 20.28, respectively.
Problem

Research questions and friction points this paper is trying to address.

Addressing semantic and geometric ambiguities in 3D scene completion
Overcoming fine-grained object-level detail neglect in complex environments
Improving semantic occupancy prediction through object-centric decomposition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Object-centric framework decomposes scenes into instances
3D Semantic Group Attention aggregates object-centric features
Instance-aware Local Diffusion refines scene representation in BEV
Weihua Wang
Weihua Wang
Faculty of Robot Science and Engineering, Northeastern University, Shenyang, Liaoning, China
Yubo Cui
Yubo Cui
Northeastern University
3d computer visionobject trackingrobot
Xiangru Lin
Xiangru Lin
The University of Hong Kong
Z
Zhiheng Li
Faculty of Robot Science and Engineering, Northeastern University, Shenyang, Liaoning, China
Z
Zheng Fang
National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, Liaoning, China