🤖 AI Summary
Existing wireframe parsing methods often suffer from geometric inconsistency and limited robustness due to the decoupled prediction of line segments and junctions. To address this, this work proposes a point-line collaborative parsing framework that introduces, for the first time, a bidirectional spatial prompting mechanism between points and lines. Specifically, a Point-Line Prompt Encoder generates spatially aligned geometric prompts, which are then integrated into a Cross-Guidance Line Decoder to enable mutual guidance and end-to-end joint optimization of junctions and lines. By combining sparse attention with geometric attribute encoding, the proposed method achieves significant improvements in parsing accuracy and robustness on both the Wireframe and YorkUrban datasets, while maintaining real-time inference efficiency.
📝 Abstract
Wireframe parsing aims to recover line segments and their junctions to form a structured geometric representation useful for downstream tasks such as Simultaneous Localization and Mapping (SLAM). Existing methods predict lines and junctions separately and reconcile them post-hoc, causing mismatches and reduced robustness. We present Co-PLNet, a point-line collaborative framework that exchanges spatial cues between the two tasks, where early detections are converted into spatial prompts via a Point-Line Prompt Encoder (PLP-Encoder), which encodes geometric attributes into compact and spatially aligned maps. A Cross-Guidance Line Decoder (CGL-Decoder) then refines predictions with sparse attention conditioned on complementary prompts, enforcing point-line consistency and efficiency. Experiments on Wireframe and YorkUrban show consistent improvements in accuracy and robustness, together with favorable real-time efficiency, demonstrating our effectiveness for structured geometry perception.