Efficient Navigation in Unknown Indoor Environments with Vision-Language Models

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Autonomous navigation in unknown indoor environments suffers from inefficient path planning and susceptibility to local minima due to weak global reasoning and overreliance on local heuristic rules. Method: This paper proposes a high-level planning framework leveraging Vision-Language Models (VLMs) to perform zero-shot spatial-structural reasoning over incomplete occupancy grid maps. The framework enables semantic subgoal selection and risk-aware evaluation, integrates partial 2D map generation with candidate subgoal ranking, and couples with the DYNUS trajectory planner for end-to-end navigation. Contribution/Results: By abandoning conventional greedy strategies, the method significantly improves global plan consistency and exploration robustness. Simulation results demonstrate a ~10% reduction in average path length, alongside notable gains in navigation efficiency and planning generalizability across unseen environments.

Technology Category

Application Category

📝 Abstract

We present a novel high-level planning framework that leverages vision-language models (VLMs) to improve autonomous navigation in unknown indoor environments with many dead ends. Traditional exploration methods often take inefficient routes due to limited global reasoning and reliance on local heuristics. In contrast, our approach enables a VLM to reason directly about an occupancy map in a zero-shot manner, selecting subgoals that are likely to lead to more efficient paths. At each planning step, we convert a 3D occupancy grid into a partial 2D map of the environment, and generate candidate subgoals. Each subgoal is then evaluated and ranked against other candidates by the model. We integrate this planning scheme into DYNUS cite{kondo2025dynus}, a state-of-the-art trajectory planner, and demonstrate improved navigation efficiency in simulation. The VLM infers structural patterns (e.g., rooms, corridors) from incomplete maps and balances the need to make progress toward a goal against the risk of entering unknown space. This reduces common greedy failures (e.g., detouring into small rooms) and achieves about 10% shorter paths on average.

Problem

Research questions and friction points this paper is trying to address.

Improving autonomous navigation efficiency in unknown indoor environments

Reducing greedy failures like detouring into dead ends

Selecting optimal subgoals using vision-language models on occupancy maps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages vision-language models for zero-shot occupancy map reasoning

Generates and ranks candidate subgoals from partial 2D maps

Integrates VLM planning with state-of-the-art trajectory planner

🔎 Similar Papers

No similar papers found.