🤖 AI Summary
Existing CAD symbol recognition methods primarily focus on geometric primitives while neglecting textual annotations and failing to explicitly model inter-primitive relationships, thereby limiting holistic understanding. To address this, we propose a geometry-text fused panoptic symbol recognition framework. Our approach is the first to jointly model textual annotations and geometric primitives, incorporating a type-aware attention mechanism that explicitly captures both spatial and semantic dependencies among primitives. We adopt a hybrid CNN-Transformer architecture within a unified representation learning framework. Evaluated on a real-world CAD dataset, our method achieves significant improvements over state-of-the-art approaches, particularly demonstrating superior robustness on drawings with dense text and complex structural layouts. This work establishes a novel paradigm for end-to-end semantic understanding of engineering drawings.
📝 Abstract
With the widespread adoption of Computer-Aided Design(CAD) drawings in engineering, architecture, and industrial design, the ability to accurately interpret and analyze these drawings has become increasingly critical. Among various subtasks, panoptic symbol spotting plays a vital role in enabling downstream applications such as CAD automation and design retrieval. Existing methods primarily focus on geometric primitives within the CAD drawings to address this task, but they face following major problems: they usually overlook the rich textual annotations present in CAD drawings and they lack explicit modeling of relationships among primitives, resulting in incomprehensive understanding of the holistic drawings. To fill this gap, we propose a panoptic symbol spotting framework that incorporates textual annotations. The framework constructs unified representations by jointly modeling geometric and textual primitives. Then, using visual features extract by pretrained CNN as the initial representations, a Transformer-based backbone is employed, enhanced with a type-aware attention mechanism to explicitly model the different types of spatial dependencies between various primitives. Extensive experiments on the real-world dataset demonstrate that the proposed method outperforms existing approaches on symbol spotting tasks involving textual annotations, and exhibits superior robustness when applied to complex CAD drawings.