PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction

πŸ“… 2025-12-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

150K/year
πŸ€– AI Summary
In visual document understanding, multi-page table structure recognition is hindered by scarce annotated data and the absence of large-scale benchmarks. To address this, we introduce PubTables-v2β€”the first large-scale benchmark explicitly designed for full-page and cross-page table scenarios, featuring fine-grained, multi-page structural annotations. For end-to-end parsing, we propose the Page-Object Table Transformer (POTATR), a unified model that directly maps document images to graph-structured table representations, jointly modeling document-level object detection and structural parsing. POTATR integrates vision-language pretraining with a Table Transformer architecture to enable image-to-graph generation. Experiments demonstrate substantial improvements in multi-page table recognition performance. We publicly release the PubTables-v2 dataset, source code, and pretrained models, establishing a new standard benchmark for multi-page table understanding.

Technology Category

Application Category

πŸ“ Abstract
Table extraction (TE) is a key challenge in visual document understanding. Traditional approaches detect tables first, then recognize their structure. Recently, interest has surged in developing methods, such as vision-language models (VLMs), that can extract tables directly in their full page or document context. However, progress has been difficult to demonstrate due to a lack of annotated data. To address this, we create a new large-scale dataset, PubTables-v2. PubTables-v2 supports a number of current challenging table extraction tasks. Notably, it is the first large-scale benchmark for multi-page table structure recognition. We demonstrate its usefulness by evaluating domain-specialized VLMs on these tasks and highlighting current progress. Finally, we use PubTables-v2 to create the Page-Object Table Transformer (POTATR), an image-to-graph extension of the Table Transformer to comprehensive page-level TE. Data, code, and trained models will be released.
Problem

Research questions and friction points this paper is trying to address.

Lack of annotated data for full-page table extraction
Need for benchmark in multi-page table structure recognition
Developing comprehensive page-level table extraction methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Created large-scale dataset PubTables-v2 for table extraction
Evaluated domain-specialized vision-language models on challenging tasks
Developed Page-Object Table Transformer for comprehensive page-level extraction
πŸ”Ž Similar Papers
No similar papers found.