CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention

๐Ÿ“… 2024-02-09
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 3
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the poor robustness and high computational complexity of two-stage birdโ€™s-eye-view (BEV) transformation modules in monocular 3D lane detection, this paper proposes the first end-to-end single-stage Transformer architecture, reformulating 3D lane detection as a temporally aware curve propagation taskโ€”thereby eliminating explicit perspective-to-BEV feature transformation. Key contributions include: (1) a novel curve query mechanism coupled with a dynamic, ordered anchor set; and (2) dedicated modules for curve cross-attention, contextual sampling, geometric anchor constraints, and sparse temporal curve fusion. Evaluated on the real-world TuSimple and CurveLanes benchmarks, our method significantly outperforms existing CNN- and Transformer-based approaches, achieving absolute gains of 4.2โ€“6.8 percentage points in 3D Average Precision (3D AP). Ablation studies comprehensively validate the effectiveness and synergistic benefits of each component.

Technology Category

Application Category

๐Ÿ“ Abstract
In autonomous driving, accurate 3D lane detection using monocular cameras is important for downstream tasks. Recent CNN and Transformer approaches usually apply a two-stage model design. The first stage transforms the image feature from a front image into a bird's-eye-view (BEV) representation. Subsequently, a sub-network processes the BEV feature to generate the 3D detection results. However, these approaches heavily rely on a challenging image feature transformation module from a perspective view to a BEV representation. In our work, we present CurveFormer++, a single-stage Transformer-based method that does not require the view transform module and directly infers 3D lane results from the perspective image features. Specifically, our approach models the 3D lane detection task as a curve propagation problem, where each lane is represented by a curve query with a dynamic and ordered anchor point set. By employing a Transformer decoder, the model can iteratively refine the 3D lane results. A curve cross-attention module is introduced to calculate similarities between image features and curve queries. To handle varying lane lengths, we employ context sampling and anchor point restriction techniques to compute more relevant image features. Furthermore, we apply a temporal fusion module that incorporates selected informative sparse curve queries and their corresponding anchor point sets to leverage historical information. In the experiments, we evaluate our approach on two publicly real-world datasets. The results demonstrate that our method provides outstanding performance compared with both CNN and Transformer based methods. We also conduct ablation studies to analyze the impact of each component.
Problem

Research questions and friction points this paper is trying to address.

Accurate 3D lane detection for autonomous driving using monocular cameras.
Eliminates the need for challenging image feature transformation modules.
Improves performance with curve propagation and temporal fusion techniques.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-stage Transformer for 3D lane detection
Curve propagation with dynamic anchor points
Temporal fusion leveraging historical lane data
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yifeng Bai
NullMax, Shanghai, 201210, China
Zhirong Chen
Zhirong Chen
Master, Institute of Computing Technology, Chinese Academy of Sciences
Computer ArchitectureMachine Learning
P
Pengpeng Liang
School of Computer and Artificial Intelligence, Zhengzhou University, 450001, China
E
Erkang Cheng
NullMax, Shanghai, 201210, China