🤖 AI Summary
Existing assistive systems for visually impaired individuals struggle with efficient, robust multi-object search in complex indoor environments under partial observability, typically focusing only on navigation or obstacle avoidance rather than open-vocabulary object localization.
Method: We propose an active search framework integrating vision-language models (VLMs) with partially observable Markov decision processes (POMDPs), incorporating a novel value decay mechanism and belief-space reasoning to enable missed-object recovery, scene-relational modeling, and adaptive target prioritization.
Contribution/Results: By combining state-of-the-art exploration strategies with open-vocabulary recognition, our system significantly improves multi-object retrieval success rates and search efficiency in both simulation and real-world settings. Experiments demonstrate its effectiveness, robustness, and scalability in cluttered, dynamic indoor environments—particularly under partial observability and occlusion.
📝 Abstract
Indoor built environments like homes and offices often present complex and cluttered layouts that pose significant challenges for individuals who are blind or visually impaired, especially when performing tasks that involve locating and gathering multiple objects. While many existing assistive technologies focus on basic navigation or obstacle avoidance, few systems provide scalable and efficient multi-object search capabilities in real-world, partially observable settings. To address this gap, we introduce OpenGuide, an assistive mobile robot system that combines natural language understanding with vision-language foundation models (VLM), frontier-based exploration, and a Partially Observable Markov Decision Process (POMDP) planner. OpenGuide interprets open-vocabulary requests, reasons about object-scene relationships, and adaptively navigates and localizes multiple target items in novel environments. Our approach enables robust recovery from missed detections through value decay and belief-space reasoning, resulting in more effective exploration and object localization. We validate OpenGuide in simulated and real-world experiments, demonstrating substantial improvements in task success rate and search efficiency over prior methods. This work establishes a foundation for scalable, human-centered robotic assistance in assisted living environments.