TARIC: Memory-Augmented Traversability-Aware Outdoor VLN under Interrupted Semantic Cues

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the challenge of disorientation in long-range outdoor vision-and-language navigation caused by sparse semantic cues, occlusions, or targets moving out of view. To this end, the authors propose a novel 3D memory mechanism that integrates real-time traversability analysis with world-coordinate alignment. For the first time, traversability is explicitly leveraged as a stability condition to sustain goal-directed navigation, complemented by an uncertainty-aware memory retrieval strategy that continuously generates feasible and goal-consistent instructions even during periods of semantic cue absence. Key technical components include visibility-gated semantic bearing extraction, near-field traversability modeling, and aligned 3D memory storage. Evaluated on real-world and simulated routes spanning 600–1000 meters, the method improves simulation success rates by over 10 percentage points and achieves a 40% success rate in real-robot trials, substantially outperforming existing baselines.

📝 Abstract

Outdoor vision-language navigation (VLN) in long-range, open-world environments is frequently disrupted by semantic-cue interruptions, where informative goal cues become sparse, occluded, or leave the field of view. Once such cues disappear, agents enter a cue-free phase and often degrade into backtracking, oscillatory headings, or aimless exploration. While memory-based methods attempt to bridge these gaps, they often fail under traversability-driven detours: the remembered cue direction may be infeasible, forcing detours that prolong cue-free phases and gradually render robot-centric cues stale and implicit histories blurred. This makes traversability a stability condition for maintaining goal-directed guidance, rather than merely a local safety concern. We propose a unified outdoor VLN framework that survives semantic-cue interruptions by maintaining traversability-consistent executable guidance throughout prolonged cue-free phases. Specifically, our method extracts semantic bearings from visibility-gated goal or exploration cues and grounds them into executable headings using a real-time near-field traversability profile, providing goal-consistent feasible guidance beyond reject-only safety filtering. To prevent guidance degradation during detours, we lift intermittent 2D evidence into a world-aligned 3D cue memory with an uncertainty-aware readout mechanism, ensuring guidance remains continuously reachable and stable as the robot moves. We evaluate the framework on quadrupedal and wheeled platforms over 600--1000 m routes. Our method improves simulation success rate by over 10 percentage points over the strongest baseline and achieves a real-world success rate of 40%, compared to 17.5% for the strongest baseline, with substantially higher robustness during prolonged cue-free intervals.

Problem

Research questions and friction points this paper is trying to address.

vision-language navigation

semantic-cue interruption

traversability

outdoor navigation

memory degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

traversability-aware navigation

memory-augmented VLN

semantic-cue interruption