OVAMOS: A Framework for Open-Vocabulary Multi-Object Search in Unknown Environments

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing challenges in open-vocabulary multi-object search within unknown indoor environments—including unstable observations, target occlusion, insufficient exploration, and uncertainty modeling—this paper proposes the first unified framework integrating vision-language models (VLMs) for semantic reasoning, frontier-based navigation, and partially observable Markov decision processes (POMDPs). The method enables zero-shot target understanding, dynamic replanning, and occlusion recovery, achieving robust target localization and adaptive search. Experiments across 120 HM3D simulation scenes and a 50 m² real-world office environment demonstrate significant improvements in search success rate and efficiency over existing baselines. The core contribution lies in the first synergistic integration of VLMs’ semantic generalization capability, geometry-driven exploration strategies, and POMDP-based uncertainty modeling for active multi-object search.

Technology Category

Application Category

📝 Abstract
Object search is a fundamental task for robots deployed in indoor building environments, yet challenges arise due to observation instability, especially for open-vocabulary models. While foundation models (LLMs/VLMs) enable reasoning about object locations even without direct visibility, the ability to recover from failures and replan remains crucial. The Multi-Object Search (MOS) problem further increases complexity, requiring the tracking multiple objects and thorough exploration in novel environments, making observation uncertainty a significant obstacle. To address these challenges, we propose a framework integrating VLM-based reasoning, frontier-based exploration, and a Partially Observable Markov Decision Process (POMDP) framework to solve the MOS problem in novel environments. VLM enhances search efficiency by inferring object-environment relationships, frontier-based exploration guides navigation in unknown spaces, and POMDP models observation uncertainty, allowing recovery from failures in occlusion and cluttered environments. We evaluate our framework on 120 simulated scenarios across several Habitat-Matterport3D (HM3D) scenes and a real-world robot experiment in a 50-square-meter office, demonstrating significant improvements in both efficiency and success rate over baseline methods.
Problem

Research questions and friction points this paper is trying to address.

Addresses open-vocabulary multi-object search in unknown environments
Integrates VLM-based reasoning and POMDP for observation uncertainty
Improves efficiency and success rate in robot object search tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

VLM-based reasoning for object-environment relationships
Frontier-based exploration for unknown space navigation
POMDP framework for handling observation uncertainty
🔎 Similar Papers
No similar papers found.
Q
Qianwei Wang
University of Michigan, Ann Arbor, MI 48109, USA
Y
Yifan Xu
University of Michigan, Ann Arbor, MI 48109, USA
V
Vineet Kamat
University of Michigan, Ann Arbor, MI 48109, USA
Carol Menassa
Carol Menassa
Professor of Civil and Environmental Engineering, University of Michigan
Sustainable ConstructionSimulationHuman Infrastructure InteractionFinance