🤖 AI Summary
Existing Transformer-based line detection methods achieve high accuracy but suffer from slow inference and heavy reliance on large-scale pretraining datasets (e.g., COCO), limiting their applicability to real-time video analysis. Method: We propose the first pretraining-free, efficient line detection framework. It introduces a geometrically aware Deformable Line Attention (DLA) mechanism that explicitly encodes structural priors of lines—eliminating dependence on large-scale pretraining—and integrates a lightweight Transformer architecture with end-to-end differentiable line fitting for joint optimization of accuracy and speed. Contribution/Results: Our method achieves state-of-the-art sAP across cross-distribution benchmarks while significantly accelerating inference—reaching real-time throughput suitable for live video stream processing.
📝 Abstract
Line detection is a basic digital image processing operation used by higher-level processing methods. Recently, transformer-based methods for line detection have proven to be more accurate than methods based on CNNs, at the expense of significantly lower inference speeds. As a result, video analysis methods that require low latencies cannot benefit from current transformer-based methods for line detection. In addition, current transformer-based models require pretraining attention mechanisms on large datasets (e.g., COCO or Object360). This paper develops a new transformer-based method that is significantly faster without requiring pretraining the attention mechanism on large datasets. We eliminate the need to pre-train the attention mechanism using a new mechanism, Deformable Line Attention (DLA). We use the term LINEA to refer to our new transformer-based method based on DLA. Extensive experiments show that LINEA is significantly faster and outperforms previous models on sAP in out-of-distribution dataset testing.