🤖 AI Summary
This work proposes the first end-to-end 3D road topology prediction framework under a pure mask-based paradigm, addressing key limitations of existing 2D mask-based methods—namely their susceptibility to discretization artifacts and geographic data leakage during evaluation. The framework introduces two novel prediction heads: a dense offset field for sub-pixel location refinement and a height map for elevation estimation, while incorporating LiDAR data to enhance long-range performance. To mitigate memorization bias, the authors establish a geographically isolated data split and a rigorous ±100-meter long-range evaluation benchmark. On this new benchmark, the method achieves a state-of-the-art OLS score of 28.5, demonstrating the robustness of mask representations against geographic overfitting and the substantial benefits of LiDAR fusion in distant scenarios.
📝 Abstract
Mask-based paradigms for road topology understanding, such as TopoMaskV2, offer a complementary alternative to query-based methods by generating centerlines via a dense rasterized intermediate representation. However, prior work was limited to 2D predictions and suffered from severe discretization artifacts, necessitating fusion with parametric heads. We introduce TopoMaskV3, which advances this pipeline into a robust, standalone 3D predictor via two novel dense prediction heads: a dense offset field for sub-grid discretization correction within the existing BEV resolution, and a dense height map for direct 3D estimation. Beyond the architecture, we are the first to address geographic data leakage in road topology evaluation by introducing (1) geographically distinct splits to prevent memorization and ensure fair generalization, and (2) a long-range (+/-100 m) benchmark. TopoMaskV3 achieves state-of-the-art 28.5 OLS on this geographically disjoint benchmark, surpassing all prior methods. Our analysis shows that the mask representation is more robust to geographic overfitting than Bezier, while LiDAR fusion is most beneficial at long range and exhibits larger relative gains on the overlapping original split, suggesting overlap-induced memorization effects.