Nav-R1: Reasoning and Navigation in Embodied Scenes

📅 2025-09-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing embodied navigation methods suffer from incoherent reasoning trajectories, poor cross-environment generalization, and a fundamental trade-off between long-horizon semantic reasoning and low-latency reactive control. To address these challenges, we propose the Fast-in-Slow inference paradigm, which decouples high-level semantic reasoning from low-level reactive control. We introduce Nav-CoT-110K—the first large-scale Chain-of-Thought (CoT) dataset for embodied navigation tasks—comprising 110K expert-annotated reasoning traces. Furthermore, we design a triple-reward reinforcement learning framework based on Generalized Reward Policy Optimization (GRPO), incorporating rewards for output format fidelity, semantic understanding, and navigation success. Our method integrates CoT-based cold-start initialization with end-to-end fine-tuning. Evaluated across multiple embodied AI benchmarks, our approach achieves an average performance gain of over 8%, significantly improving reasoning coherence and path accuracy. Extensive ablations and real-robot deployment under resource constraints further validate its efficiency and robustness.

Technology Category

Application Category

📝 Abstract
Embodied navigation requires agents to integrate perception, reasoning, and action for robust interaction in complex 3D environments. Existing approaches often suffer from incoherent and unstable reasoning traces that hinder generalization across diverse environments, and difficulty balancing long-horizon semantic reasoning with low-latency control for real-time navigation. To address these challenges, we propose Nav-R1, an embodied foundation model that unifies reasoning in embodied environments. We first construct Nav-CoT-110K, a large-scale dataset of step-by-step Chains-of-Thought (CoT) for embodied tasks, which enables cold-start initialization with structured reasoning. Building on this foundation, we design a GRPO-based reinforcement learning framework with three complementary rewards: format, understanding, and navigation, to improve structural adherence, semantic grounding, and path fidelity. Furthermore, we introduce a Fast-in-Slow reasoning paradigm, decoupling deliberate semantic reasoning from low-latency reactive control for efficient yet coherent navigation. Extensive evaluations on embodied AI benchmarks demonstrate that Nav-R1 consistently outperforms strong baselines, with over 8% average improvement in reasoning and navigation performance. Real-world deployment on a mobile robot further validates its robustness under limited onboard resources. Code: https://github.com/AIGeeksGroup/Nav-R1. Website: https://aigeeksgroup.github.io/Nav-R1.
Problem

Research questions and friction points this paper is trying to address.

Addresses incoherent reasoning traces in embodied navigation
Balances long-horizon semantic reasoning with real-time control
Improves generalization across diverse 3D environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale CoT dataset for cold-start reasoning
GRPO reinforcement learning with three complementary rewards
Fast-in-Slow paradigm decouples reasoning from control
🔎 Similar Papers
No similar papers found.
Qingxiang Liu
Qingxiang Liu
Institute of Computing Technology Chinese Academy of Sciences
time series analysisfoundation modelspatio-temporal data miningfederated learning
T
Ting Huang
Shanghai University of Engineering Science
Z
Zeyu Zhang
Peking University
H
Hao Tang
Peking University