🤖 AI Summary
To address weak content-structure correlation and inaccurate spatial localization of cells in geometrically distorted tables, this paper proposes the first fine-grained cell instance segmentation framework. Methodologically, we design a gradient-direction-aware feature extractor to model deformation orientation; introduce a heterogeneous kernel cross-fusion module to enhance local geometric robustness; and formulate a scale-aware loss function coupled with mask-driven non-maximum suppression (NMS) to improve localization accuracy. Furthermore, we construct DWTAL—the first high-quality benchmark dataset for precise cell localization in distorted tables—and augment training with synthetic data to boost generalization. Extensive experiments demonstrate significant performance gains over mainstream instance segmentation baselines, including Mask R-CNN and QueryInst. Both the source code and the DWTAL dataset are publicly released.
📝 Abstract
Table structure recognition is a key task in document analysis. However, the geometric deformation in deformed tables causes a weak correlation between content information and structure, resulting in downstream tasks not being able to obtain accurate content information. To obtain fine-grained spatial coordinates of cells, we propose the OG-HFYOLO model, which enhances the edge response by Gradient Orientation-aware Extractor, combines a Heterogeneous Kernel Cross Fusion module and a scale-aware loss function to adapt to multi-scale objective features, and introduces mask-driven non-maximal suppression in the post-processing, which replaces the traditional bounding box suppression mechanism. Furthermore, we also propose a data generator, filling the gap in the dataset for fine-grained deformation table cell spatial coordinate localization, and derive a large-scale dataset named Deformation Wired Table (DWTAL). Experiments show that our proposed model demonstrates excellent segmentation accuracy on all mainstream instance segmentation models. The dataset and the source code are open source: https://github.com/justliulong/OGHFYOLO.