EvoMemNav: Efficient Self-Evolving Fine-Grained Memory for Zero-Shot Embodied Navigation

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

213K/year
🤖 AI Summary
This work addresses the challenge in existing zero-shot embodied navigation methods of balancing efficiency and accuracy in long-term memory construction, where sparse scene graphs lose fine-grained details and dense 3D reconstructions incur prohibitive computational costs. The authors propose EvoMemNav, which uniquely treats raw views as primary memory units and organizes them into a hierarchical Visual-Semantic Memory Graph (VSMGraph) spanning rooms, views, and objects, structured via lightweight semantic and topological relations. A coarse-to-fine budgeted reasoning strategy controls computational overhead, while a post-task reflection mechanism dynamically updates environmental priors. Without requiring retraining, EvoMemNav enables zero-shot continual improvement and achieves significant gains in Success Rate (SR) and Success weighted by Path Length (SPL) across object, text, and image targets on GOAT-Bench and HM3D, demonstrating superior multi-instance disambiguation, reduced premature stopping, and strong zero-shot generalization.
📝 Abstract
Building memory is essential for long-horizon planning in zero-shot embodied navigation. Detector-centric scene graphs often compress observations into sparse nodes, discarding fine-grained visual evidence and accumulating noise, while 3D reconstruction-based methods remain computationally prohibitive. We present EvoMemNav, an efficient, self-evolving, fine-grained memory framework for zero-shot embodied navigation. EvoMemNav constructs a Visual-Semantic Memory Graph (VSMGraph) that keeps raw views as first-class memory and organizes them with lightweight semantic cues and topological relations into a room-view-object hierarchy, preserving fine-grained details for disambiguation and Stop verification. To scale to growing memory, we introduce a budgeted coarse-to-fine policy: a coarse stage compresses the search space into promising regions, and a fine stage invokes a VLM only for targeted verification and decision. Beyond static memories, EvoMemNav performs reflection-driven write-back after each subtask, updating graph-attached priors that encode accumulated environmental knowledge to refine future decisions without retraining. Experiments on GOAT-Bench and HM3D across object, text-description, and image-goal modalities show consistent gains in SR/SPL, with better multi-instance disambiguation, fewer premature stops, and stronger zero-shot generalization.
Problem

Research questions and friction points this paper is trying to address.

zero-shot embodied navigation
fine-grained memory
long-horizon planning
visual evidence preservation
memory efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-evolving memory
fine-grained memory
visual-semantic memory graph
zero-shot embodied navigation
coarse-to-fine retrieval
🔎 Similar Papers
No similar papers found.