🤖 AI Summary
Existing scientific document table extraction (TE) methods suffer from poor generalization, low robustness, and limited interpretability when applied to heterogeneous data. Method: We introduce the first large-scale heterogeneous scientific table benchmark—comprising 37K multi-source samples—and propose an end-to-end fine-grained evaluation framework that decouples subtasks (e.g., table detection and structure recognition), quantifies model uncertainty via confidence scoring, and systematically exposes deficiencies in conventional evaluation metrics. We conduct a unified comparative study integrating PDF parsing libraries, domain-specific tools, computer vision models, and multimodal large language models. Results: Empirical evaluation reveals substantial performance degradation of state-of-the-art TE methods under real-world heterogeneity, validating our framework’s critical role in advancing robust, interpretable, and reproducible TE research.
📝 Abstract
Table Extraction (TE) consists in extracting tables from PDF documents, in a structured format which can be automatically processed. While numerous TE tools exist, the variety of methods and techniques makes it difficult for users to choose an appropriate one. We propose a novel benchmark for assessing end-to-end TE methods (from PDF to the final table). We contribute an analysis of TE evaluation metrics, and the design of a rigorous evaluation process, which allows scoring each TE sub-task as well as end-to-end TE, and captures model uncertainty. Along with a prior dataset, our benchmark comprises two new heterogeneous datasets of 37k samples. We run our benchmark on diverse models, including off-the-shelf libraries, software tools, large vision language models, and approaches based on computer vision. The results demonstrate that TE remains challenging: current methods suffer from a lack of generalizability when facing heterogeneous data, and from limitations in robustness and interpretability.