RANa: Retrieval-Augmented Navigation

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of experience reuse in embodied navigation, where agents typically discard historical knowledge after each episode. To overcome this limitation, we propose a retrieval-augmented navigation framework that enables dynamic retrieval and fusion of semantic and geometric information accumulated across prior tasks within the same environment—departing from conventional episodic memory reset paradigms. Methodologically, we design a multimodal database retrieval and context encoding mechanism grounded in vision foundation models, integrated with end-to-end reinforcement learning. We introduce the first navigation architecture supporting zero-shot transfer across both tasks and environments, and establish a novel benchmark explicitly evaluating historical information reuse. Experiments demonstrate significant performance gains on ObjectNav, ImageNav, and Instance-ImageNav tasks, validating strong generalization and zero-shot cross-task and cross-environment transfer capabilities.

Technology Category

Application Category

📝 Abstract
Methods for navigation based on large-scale learning typically treat each episode as a new problem, where the agent is spawned with a clean memory in an unknown environment. While these generalization capabilities to an unknown environment are extremely important, we claim that, in a realistic setting, an agent should have the capacity of exploiting information collected during earlier robot operations. We address this by introducing a new retrieval-augmented agent, trained with RL, capable of querying a database collected from previous episodes in the same environment and learning how to integrate this additional context information. We introduce a unique agent architecture for the general navigation task, evaluated on ObjectNav, ImageNav and Instance-ImageNav. Our retrieval and context encoding methods are data-driven and heavily employ vision foundation models (FM) for both semantic and geometric understanding. We propose new benchmarks for these settings and we show that retrieval allows zero-shot transfer across tasks and environments while significantly improving performance.
Problem

Research questions and friction points this paper is trying to address.

Enhancing navigation by leveraging past episode data
Integrating retrieval-augmented context into RL-trained agents
Enabling zero-shot transfer across tasks via vision FMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-augmented agent trained with RL
Database querying from previous episodes
Vision foundation models for context encoding
🔎 Similar Papers
No similar papers found.
G
G. Monaci
Naver Labs Europe, Meylan, France
R
R. S. Rezende
Naver Labs Europe, Meylan, France
Romain Deffayet
Romain Deffayet
Naver Labs Europe
Reinforcement LearningRecommender SystemsUnbiased Learning to Rank
G
G. Csurka
Naver Labs Europe, Meylan, France
G
G. Bono
Naver Labs Europe, Meylan, France
H
Herv'e D'ejean
Naver Labs Europe, Meylan, France
S
S. Clinchant
Naver Labs Europe, Meylan, France
Christian Wolf
Christian Wolf
Naver Labs Europe
AI for RoboticsMachine LearningComputer Vision