SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

For semantic 3D reconstruction from sparse, pose-free images, existing methods face three key challenges in fusing pixel-level semantics: (1) prohibitive memory/storage overhead from high-dimensional semantic features; (2) degraded representational capacity due to feature compression; and (3) redundancy in primitive predictions within overlapping regions. This paper proposes a redundancy-aware Gaussian representation coupled with a dual-field semantic modeling framework. We introduce, for the first time, a synergistic representation comprising a coarse-grained, uncompressed semantic field and a fine-grained, low-dimensional relational field. Our approach integrates selective Gaussian instantiation with instance-consistency-driven semantic disentanglement. Leveraging feedforward Gaussian splatting and redundancy-aware pruning, the method reduces scene parameters by 60% while surpassing state-of-the-art methods in both geometric reconstruction accuracy and semantic segmentation performance.

Technology Category

Application Category

📝 Abstract

A major breakthrough in 3D reconstruction is the feedforward paradigm to generate pixel-wise 3D points or Gaussian primitives from sparse, unposed images. To further incorporate semantics while avoiding the significant memory and storage costs of high-dimensional semantic features, existing methods extend this paradigm by associating each primitive with a compressed semantic feature vector. However, these methods have two major limitations: (a) the naively compressed feature compromises expressiveness, affecting the model's ability to capture fine-grained semantics, and (b) the pixel-wise primitive prediction introduces redundancy in overlapping areas, causing unnecessary memory overhead. To this end, we introduce extbf{SpatialSplat}, a feedforward framework that produces redundancy-aware Gaussians and capitalizes on a dual-field semantic representation. Particularly, with the insight that primitives within the same instance exhibit high semantic consistency, we decompose the semantic representation into a coarse feature field that encodes uncompressed semantics with minimal primitives, and a fine-grained yet low-dimensional feature field that captures detailed inter-instance relationships. Moreover, we propose a selective Gaussian mechanism, which retains only essential Gaussians in the scene, effectively eliminating redundant primitives. Our proposed Spatialsplat learns accurate semantic information and detailed instances prior with more compact 3D Gaussians, making semantic 3D reconstruction more applicable. We conduct extensive experiments to evaluate our method, demonstrating a remarkable 60% reduction in scene representation parameters while achieving superior performance over state-of-the-art methods. The code will be made available for future investigation.

Problem

Research questions and friction points this paper is trying to address.

Improves semantic 3D reconstruction from sparse unposed images

Reduces memory overhead by eliminating redundant Gaussian primitives

Enhances fine-grained semantics with dual-field representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-field semantic representation for fine-grained semantics

Selective Gaussian mechanism to eliminate redundancy

Compact 3D Gaussians with accurate semantic information

🔎 Similar Papers

Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance