GN0: Toward a Unified Paradigm for Generation, Evaluation, and Policy Learning in Visual-Language Navigation

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

225K/year
🤖 AI Summary
This work addresses the limitations of vision-and-language navigation (VLN)—notably data scarcity and insufficient simulation fidelity—that hinder generalization to complex, long-horizon tasks. The authors propose a unified VLN paradigm comprising a large-scale GN-Matrix dataset and a high-fidelity interactive simulator based on 3D Gaussian Splatting (3DGS). They introduce GN-BAE, an end-to-end foundation model that integrates reinforcement learning with DAgger for policy learning, and pioneer the use of bird’s-eye-view (BEV) representations as a compact memory mechanism to enhance spatial reasoning in vision-language models. To support comprehensive evaluation, they release GN-Bench—the first BEV-based VLN benchmark—and dynamic 3DGS avatars. Experiments demonstrate that the proposed approach significantly outperforms state-of-the-art methods on both GN-Bench and VLN-CE, excelling across diverse tasks including instruction following, goal-oriented navigation, and human-following scenarios.
📝 Abstract
Embodied navigation connects intelligent agents with the physical world and is fundamental for general robotic intelligence. Limited availability and quality of navigation data have constrained Vision-and-Language Navigation (VLN) systems' generalization and long-horizon capabilities. To address this, we curate diverse 3D scenes and develop an automated pipeline for large-scale navigation data, resulting in the GN-Matrix dataset. Building on a 3D Gaussian Splatting (3DGS) engine, we introduce a high-fidelity simulation platform supporting interactive roaming and collision-aware navigation. We further propose GN-Bench, the first BEV-based benchmark incorporating dynamic 3DGS avatars for human-robot interaction evaluation. To leverage the simulator, we develop an RL-driven navigation foundation model, Break and Establish (BAE). After supervised learning, DAgger exposes the model to rollout-induced states, breaking narrow expert-centric distributions and enabling downstream RL exploration. This unified VLN paradigm integrates map-based and map-free tasks, including instruction following, human following, and goal navigation. GN-BAE formalizes high-fidelity 3DGS-rendered Bird's Eye View representations as compact memory, unlocking latent spatial reasoning in VLMs. Extensive evaluations on GN-Bench and VLN-CE show that GN0 outperforms state-of-the-art VLN methods. Overall, GN-Matrix offers a unified framework spanning data, simulation, and learning, advancing embodied navigation in research and industrial applications.
Problem

Research questions and friction points this paper is trying to address.

Vision-and-Language Navigation
embodied navigation
navigation data scarcity
generalization
long-horizon navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting
Bird's Eye View
Vision-and-Language Navigation
Reinforcement Learning
Embodied AI
🔎 Similar Papers
No similar papers found.
X
Xinhai Li
Institute of Artificial Intelligence, China Telecom
X
Xiaotao Zhang
Institute of Artificial Intelligence, China Telecom; Shanghai Jiao Tong University
Y
Yuehao Huang
Institute of Artificial Intelligence, China Telecom; Zhejiang University
J
Jiankun Dong
Institute of Artificial Intelligence, China Telecom
T
Tianhang Wang
Institute of Artificial Intelligence, China Telecom; Tongji University
S
Sunyao Zhou
Institute of Artificial Intelligence, China Telecom; Fudan University
Y
Yunzi Wu
Institute of Artificial Intelligence, China Telecom; Tongji University
C
Chengnuo Sun
Institute of Artificial Intelligence, China Telecom; Jiangsu University
Yunfei Ge
Yunfei Ge
New York University
Game theoryCyber Security
Qizhen Weng
Qizhen Weng
Hong Kong University of Science and Technology
Machine Learning SystemsAI InfrastructureCloud Computing
C
Chi Zhang
Institute of Artificial Intelligence, China Telecom
Chenjia Bai
Chenjia Bai
Institute of Artificial Intelligence, China Telecom(中国电信人工智能研究院, TeleAI)
Reinforcement LearningRoboticsEmbodied AI
X
Xuelong Li
Institute of Artificial Intelligence, China Telecom