FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models

📅 2025-07-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing lane topology reasoning methods suffer from weak temporal modeling: they over-rely on historical queries, are sensitive to pose estimation errors, and lack sufficient temporal propagation. This paper proposes a novel fast-slow dual-path spatiotemporal co-modeling framework operating in the Bird’s Eye View (BEV) space, jointly performing lane segment detection and topology reasoning. We introduce an action-driven latent query mechanism and a lightweight BEV world model to enable robust historical state propagation. Additionally, we design a streaming temporal propagation scheme coupled with parallel supervision at both the query level and the BEV feature level to enhance spatiotemporal consistency. Evaluated on OpenLane-V2, our method achieves significant improvements: 37.4% mAP for lane segment detection and 46.3% OLS for centerline prediction—setting new state-of-the-art performance. The proposed approach delivers a more reliable and temporally robust solution for end-to-end autonomous driving systems.

Technology Category

Application Category

📝 Abstract
Lane segment topology reasoning provides comprehensive bird's-eye view (BEV) road scene understanding, which can serve as a key perception module in planning-oriented end-to-end autonomous driving systems. Existing lane topology reasoning methods often fall short in effectively leveraging temporal information to enhance detection and reasoning performance. Recently, stream-based temporal propagation method has demonstrated promising results by incorporating temporal cues at both the query and BEV levels. However, it remains limited by over-reliance on historical queries, vulnerability to pose estimation failures, and insufficient temporal propagation. To overcome these limitations, we propose FASTopoWM, a novel fast-slow lane segment topology reasoning framework augmented with latent world models. To reduce the impact of pose estimation failures, this unified framework enables parallel supervision of both historical and newly initialized queries, facilitating mutual reinforcement between the fast and slow systems. Furthermore, we introduce latent query and BEV world models conditioned on the action latent to propagate the state representations from past observations to the current timestep. This design substantially improves the performance of temporal perception within the slow pipeline. Extensive experiments on the OpenLane-V2 benchmark demonstrate that FASTopoWM outperforms state-of-the-art methods in both lane segment detection (37.4% v.s. 33.6% on mAP) and centerline perception (46.3% v.s. 41.5% on OLS).
Problem

Research questions and friction points this paper is trying to address.

Improving lane topology reasoning with temporal information
Reducing reliance on historical queries and pose estimation
Enhancing temporal perception via latent world models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel supervision of historical and new queries
Latent query and BEV world models
Fast-slow lane segment topology reasoning
🔎 Similar Papers
No similar papers found.
Y
Yiming Yang
FNii-Shenzhen, Shenzhen, China; SSE, CUHK-Shenzhen, Shenzhen, China
H
Hongbin Lin
FNii-Shenzhen, Shenzhen, China; SSE, CUHK-Shenzhen, Shenzhen, China
Yueru Luo
Yueru Luo
The Chinese University of Hong Kong, Shenzhen
Computer Vision
S
Suzhong Fu
FNii-Shenzhen, Shenzhen, China; SSE, CUHK-Shenzhen, Shenzhen, China
Chao Zheng
Chao Zheng
T Lab, Tencent, Beijing, China
X
Xinrui Yan
T Lab, Tencent, Beijing, China
S
Shuqi Mei
T Lab, Tencent, Beijing, China
K
Kun Tang
T Lab, Tencent, Beijing, China
Shuguang Cui
Shuguang Cui
Distinguished Presidential Chair Professor, School of Science and Engineering, CUHKSZ
AI+NetworkingWireless Communications
Z
Zhen Li
SSE, CUHK-Shenzhen, Shenzhen, China; FNii-Shenzhen, Shenzhen, China