π€ AI Summary
This work addresses the limitations of existing Next-Best-View (NBV) methods in cluttered scenes, which rely solely on geometric cues, neglect semantic information, and favor exploitation over exploration, often resulting in incomplete reconstructions. To overcome these issues, the authors propose an instance-aware NBV strategy that encodes instance-level semantics into one-hot object vectors and combines them with confidence-weighted information gain to identify under-observed regions. This approach enables object-centric viewpoint planning. Integrated within a 3D Gaussian Splatting (3DGS) framework, the method reduces depth error by 77.14% on synthetic data and by 34.10% on the real-world GraspNet dataset. When targeting specific objects, the NBV further decreases error by 25.60%, with real robotic experiments validating its effectiveness.
π Abstract
In cluttered scenes with inevitable occlusions and incomplete observations, selecting informative viewpoints is essential for building a reliable representation. In this context, 3D Gaussian Splatting (3DGS) offers a distinct advantage, as it can explicitly guide the selection of subsequent viewpoints and then refine the representation with new observations. However, existing approaches rely solely on geometric cues, neglect manipulation-relevant semantics, and tend to prioritize exploitation over exploration. To tackle these limitations, we introduce an instance-aware Next Best View (NBV) policy that prioritizes underexplored regions by leveraging object features. Specifically, our object-aware 3DGS distills instancelevel information into one-hot object vectors, which are used to compute confidence-weighted information gain that guides the identification of regions associated with erroneous and uncertain Gaussians. Furthermore, our method can be easily adapted to an object-centric NBV, which focuses view selection on a target object, thereby improving reconstruction robustness to object placement. Experiments demonstrate that our NBV policy reduces depth error by up to 77.14% on the synthetic dataset and 34.10% on the real-world GraspNet dataset compared to baselines. Moreover, compared to targeting the entire scene, performing NBV on a specific object yields an additional reduction of 25.60% in depth error for that object. We further validate the effectiveness of our approach through real-world robotic manipulation tasks.