🤖 AI Summary
This work addresses task-driven active exploration under open-vocabulary instructions (e.g., “find a person”) by proposing a task-aware real-time semantic 3D reconstruction and navigation framework. Methodologically, it introduces a view-semantic coverage metric that jointly optimizes geometric viewpoint diversity and instruction-semantic relevance to guide information-gain-maximizing trajectory planning. The framework integrates online semantic Gaussian splatting reconstruction, forward-field-of-view–constrained planning, and open-vocabulary vision-language understanding to enable end-to-end task-oriented 3D mapping and navigation. Evaluation demonstrates significant improvements over FisherRF and Bayes’ Rays on static benchmarks; in drone experiments, map search success rate increases sixfold; and cross-platform deployment is validated on both aerial drones and quadruped robots.
📝 Abstract
We present VISTA (Viewpoint-based Image selection with Semantic Task Awareness), an active exploration method for robots to plan informative trajectories that improve 3D map quality in areas most relevant for task completion. Given an open-vocabulary search instruction (e.g., "find a person"), VISTA enables a robot to explore its environment to search for the object of interest, while simultaneously building a real-time semantic 3D Gaussian Splatting reconstruction of the scene. The robot navigates its environment by planning receding-horizon trajectories that prioritize semantic similarity to the query and exploration of unseen regions of the environment. To evaluate trajectories, VISTA introduces a novel, efficient viewpoint-semantic coverage metric that quantifies both the geometric view diversity and task relevance in the 3D scene. On static datasets, our coverage metric outperforms state-of-the-art baselines, FisherRF and Bayes' Rays, in computation speed and reconstruction quality. In quadrotor hardware experiments, VISTA achieves 6x higher success rates in challenging maps, compared to baseline methods, while matching baseline performance in less challenging maps. Lastly, we show that VISTA is platform-agnostic by deploying it on a quadrotor drone and a Spot quadruped robot. Open-source code will be released upon acceptance of the paper.