VISTA: Open-Vocabulary, Task-Relevant Robot Exploration with Online Semantic Gaussian Splatting

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses task-driven active exploration under open-vocabulary instructions (e.g., “find a person”) by proposing a task-aware real-time semantic 3D reconstruction and navigation framework. Methodologically, it introduces a view-semantic coverage metric that jointly optimizes geometric viewpoint diversity and instruction-semantic relevance to guide information-gain-maximizing trajectory planning. The framework integrates online semantic Gaussian splatting reconstruction, forward-field-of-view–constrained planning, and open-vocabulary vision-language understanding to enable end-to-end task-oriented 3D mapping and navigation. Evaluation demonstrates significant improvements over FisherRF and Bayes’ Rays on static benchmarks; in drone experiments, map search success rate increases sixfold; and cross-platform deployment is validated on both aerial drones and quadruped robots.

Technology Category

Application Category

📝 Abstract

We present VISTA (Viewpoint-based Image selection with Semantic Task Awareness), an active exploration method for robots to plan informative trajectories that improve 3D map quality in areas most relevant for task completion. Given an open-vocabulary search instruction (e.g., "find a person"), VISTA enables a robot to explore its environment to search for the object of interest, while simultaneously building a real-time semantic 3D Gaussian Splatting reconstruction of the scene. The robot navigates its environment by planning receding-horizon trajectories that prioritize semantic similarity to the query and exploration of unseen regions of the environment. To evaluate trajectories, VISTA introduces a novel, efficient viewpoint-semantic coverage metric that quantifies both the geometric view diversity and task relevance in the 3D scene. On static datasets, our coverage metric outperforms state-of-the-art baselines, FisherRF and Bayes' Rays, in computation speed and reconstruction quality. In quadrotor hardware experiments, VISTA achieves 6x higher success rates in challenging maps, compared to baseline methods, while matching baseline performance in less challenging maps. Lastly, we show that VISTA is platform-agnostic by deploying it on a quadrotor drone and a Spot quadruped robot. Open-source code will be released upon acceptance of the paper.

Problem

Research questions and friction points this paper is trying to address.

Plan informative robot trajectories for task-relevant 3D mapping

Enable real-time semantic 3D reconstruction during exploration

Evaluate trajectories using viewpoint-semantic coverage for task efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online semantic Gaussian Splatting for 3D reconstruction

Viewpoint-semantic coverage metric for trajectory evaluation

Open-vocabulary task-aware active exploration method

🔎 Similar Papers

One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation