POTATR: A Lightweight Image-to-Graph Model for Page-Level Table Extraction

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing lightweight table extraction methods struggle to simultaneously achieve high accuracy, computational efficiency, and contextual awareness. This work proposes POTATR, a lightweight image-to-graph model built upon an extended Table Transformer architecture, which for the first time attains state-of-the-art performance with only 29 million parameters—surpassing large multimodal models. POTATR incorporates page-level contextual awareness, spatial bounding box localization, and supports external OCR integration as well as cross-page table merging. Evaluated on the PubTables-v2 single-page benchmark, it achieves a GriTS_Con score of 0.964 while offering over 130× faster inference and approximately 300× lower computational cost compared to leading alternatives.

📝 Abstract

Large-scale document processing requires contextually aware table extraction (TE) that is both accurate and efficient. Yet current approaches require billions of parameters, hundreds of autoregressive steps, or costly API inference. Motivated by this, we introduce the Page-Object Table Transformer (POTATR), a lightweight 29M parameter image-to-graph model that extends the Table Transformer (TATR) for contextualized page-level TE. POTATR outperforms all models tested on the PubTables-v2 Single Pages benchmark -- including frontier MLLMs -- achieving $\textrm{GriTS}_\textrm{Con}$ of 0.964 while running over 130$\times$ faster at roughly 300$\times$ lower cost. Further, POTATR's output is spatially grounded: every recognized element has a bounding box, enabling visual verification and geometric text assignment. As a result, POTATR performs unified page-level TE while composing with other models, enabling extension to scanned documents via external OCR and to full-document TE via techniques like cross-page merging. Code and models will be released.

Problem

Research questions and friction points this paper is trying to address.

table extraction

document processing

page-level analysis

lightweight model

contextual awareness

Innovation

Methods, ideas, or system contributions that make the work stand out.

lightweight

image-to-graph

table extraction