Visibility-Aware Language Aggregation for Open-Vocabulary Segmentation in 3D Gaussian Splatting

📅 2025-09-05

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Addressing two key challenges in open-vocabulary semantic segmentation for 3D Gaussian Splatting—(i) language feature contamination caused by redundant background Gaussians and (ii) multi-view inconsistency induced by view-specific noise—this paper proposes a visibility-aware language feature fusion method. Our approach features: (1) a ray-visibility-based gating mechanism that dynamically suppresses linguistic responses from low-contribution Gaussians, and (2) streaming weighted geometric median fusion in cosine space to enhance cross-view consistency of language features. The method is lightweight and training-free, requiring no auxiliary networks or additional supervision. Evaluated on multiple open-vocabulary localization and segmentation benchmarks, it significantly outperforms existing state-of-the-art methods, achieving superior accuracy, robustness against viewpoint and occlusion variations, and real-time inference speed.

Technology Category

Application Category

📝 Abstract

Recently, distilling open-vocabulary language features from 2D images into 3D Gaussians has attracted significant attention. Although existing methods achieve impressive language-based interactions of 3D scenes, we observe two fundamental issues: background Gaussians contributing negligibly to a rendered pixel get the same feature as the dominant foreground ones, and multi-view inconsistencies due to view-specific noise in language embeddings. We introduce Visibility-Aware Language Aggregation (VALA), a lightweight yet effective method that computes marginal contributions for each ray and applies a visibility-aware gate to retain only visible Gaussians. Moreover, we propose a streaming weighted geometric median in cosine space to merge noisy multi-view features. Our method yields a robust, view-consistent language feature embedding in a fast and memory-efficient manner. VALA improves open-vocabulary localization and segmentation across reference datasets, consistently surpassing existing works.

Problem

Research questions and friction points this paper is trying to address.

Addresses background Gaussian noise in 3D segmentation

Solves multi-view inconsistency in language embeddings

Improves open-vocabulary localization and segmentation accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Visibility-aware gate retains visible Gaussians

Streaming weighted geometric median merges features

Robust view-consistent language embedding efficiently

🔎 Similar Papers

3D Vision-Language Gaussian Splatting