GRIP: A Unified Framework for Grid-Based Relay and Co-Occurrence-Aware Planning in Dynamic Environments

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient generalization in robotic navigation under dynamic, cluttered, and semantically complex partially observable environments, this paper proposes GRIP—a unified framework integrating perception, symbolic reasoning, and spatial planning. Its key contributions are: (1) a grid-based relaying mechanism coupled with co-occurrence-aware symbolic planning, enabling multi-hop anchoring chains; (2) introspective reasoning via large language models to enhance decision interpretability and robustness; and (3) a hybrid execution strategy combining semantic occupancy grids, open-vocabulary localization, behavior cloning, and grid-conditioned control. Evaluated on AI2-THOR and RoboTHOR, GRIP achieves a 9.6% absolute improvement in long-horizon task success rate, with SPL and SAE metrics more than doubling. Real-world deployment on JetBot further validates its strong generalization against perceptual noise and environmental dynamics.

Technology Category

Application Category

📝 Abstract
Robots navigating dynamic, cluttered, and semantically complex environments must integrate perception, symbolic reasoning, and spatial planning to generalize across diverse layouts and object categories. Existing methods often rely on static priors or limited memory, constraining adaptability under partial observability and semantic ambiguity. We present GRIP, Grid-based Relay with Intermediate Planning, a unified, modular framework with three scalable variants: GRIP-L (Lightweight), optimized for symbolic navigation via semantic occupancy grids; GRIP-F (Full), supporting multi-hop anchor chaining and LLM-based introspection; and GRIP-R (Real-World), enabling physical robot deployment under perceptual uncertainty. GRIP integrates dynamic 2D grid construction, open-vocabulary object grounding, co-occurrence-aware symbolic planning, and hybrid policy execution using behavioral cloning, D* search, and grid-conditioned control. Empirical results on AI2-THOR and RoboTHOR benchmarks show that GRIP achieves up to 9.6% higher success rates and over $2 imes$ improvement in path efficiency (SPL and SAE) on long-horizon tasks. Qualitative analyses reveal interpretable symbolic plans in ambiguous scenes. Real-world deployment on a Jetbot further validates GRIP's generalization under sensor noise and environmental variation. These results position GRIP as a robust, scalable, and explainable framework bridging simulation and real-world navigation.
Problem

Research questions and friction points this paper is trying to address.

Robots struggle with navigation in dynamic cluttered semantically complex environments
Existing methods lack adaptability under partial observability and semantic ambiguity
Current approaches constrain performance in long-horizon tasks requiring spatial reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic 2D grid construction for environment representation
Co-occurrence-aware symbolic planning with open-vocabulary grounding
Hybrid policy execution combining D* search and behavioral cloning
🔎 Similar Papers
No similar papers found.