Table Detection with Active Learning

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

To address the high annotation cost and limited supervision signals in table detection, this paper proposes an active learning framework integrating uncertainty estimation with diversity sampling. Building upon CascadeTabNet and YOLOv9 as baseline detectors, we design a dual-criterion sample selection strategy to maximize information gain under constrained annotation budgets. Experiments on TableBank-LaTeX and TableBank-Word demonstrate that our method significantly outperforms random sampling in mAP at equivalent annotation effort; notably, it achieves over 95% of the fully supervised model’s performance using only ~40% of the annotated data. Our core contribution lies in establishing an efficient, table-structure-aware active learning paradigm that jointly optimizes sample discriminability and distributional representativeness—thereby providing a scalable technical pathway for low-resource document understanding.

Technology Category

Application Category

📝 Abstract

Efficient data annotation remains a critical challenge in machine learning, particularly for object detection tasks requiring extensive labeled data. Active learning (AL) has emerged as a promising solution to minimize annotation costs by selecting the most informative samples. While traditional AL approaches primarily rely on uncertainty-based selection, recent advances suggest that incorporating diversity-based strategies can enhance sampling efficiency in object detection tasks. Our approach ensures the selection of representative examples that improve model generalization. We evaluate our method on two benchmark datasets (TableBank-LaTeX, TableBank-Word) using state-of-the-art table detection architectures, CascadeTabNet and YOLOv9. Our results demonstrate that AL-based example selection significantly outperforms random sampling, reducing annotation effort given a limited budget while maintaining comparable performance to fully supervised models. Our method achieves higher mAP scores within the same annotation budget.

Problem

Research questions and friction points this paper is trying to address.

Reducing annotation costs for table detection using active learning

Improving sampling efficiency by combining uncertainty and diversity strategies

Maintaining detection performance while minimizing labeled data requirements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines uncertainty and diversity-based active learning

Uses CascadeTabNet and YOLOv9 detection architectures

Selects informative samples to reduce annotation costs

🔎 Similar Papers

No similar papers found.