Leveraging Vision Capabilities of Multimodal LLMs for Automated Data Extraction from Plots

📅 2025-03-16

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

Automated extraction of data from dual-axis plots in scientific papers remains challenging due to heavy reliance on manual annotation and a scarcity of robust, generalizable solutions. Method: This paper introduces PlotExtract—a zero-shot, fine-tuning-free multimodal large language model (MLLM)-based chart parsing method. It leverages the inherent visual understanding capabilities of pre-trained MLLMs and employs a lightweight, interpretable chain-of-thought (CoT) prompting workflow to jointly localize coordinates and parse numerical values end-to-end. Contribution/Results: We provide the first empirical validation that off-the-shelf MLLMs can achieve high-accuracy plot data extraction under zero-shot settings. On both synthetic and real-world scientific paper datasets, PlotExtract attains >90% precision, ~90% recall, and x/y coordinate errors ≤5%, while significantly improving throughput and robustness. This establishes a novel paradigm for scientific literature data mining.

Technology Category

Application Category

📝 Abstract

Automated data extraction from research texts has been steadily improving, with the emergence of large language models (LLMs) accelerating progress even further. Extracting data from plots in research papers, however, has been such a complex task that it has predominantly been confined to manual data extraction. We show that current multimodal large language models, with proper instructions and engineered workflows, are capable of accurately extracting data from plots. This capability is inherent to the pretrained models and can be achieved with a chain-of-thought sequence of zero-shot engineered prompts we call PlotExtract, without the need to fine-tune. We demonstrate PlotExtract here and assess its performance on synthetic and published plots. We consider only plots with two axes in this analysis. For plots identified as extractable, PlotExtract finds points with over 90% precision (and around 90% recall) and errors in x and y position of around 5% or lower. These results prove that multimodal LLMs are a viable path for high-throughput data extraction for plots and in many circumstances can replace the current manual methods of data extraction.

Problem

Research questions and friction points this paper is trying to address.

Automated data extraction from plots in research papers.

Utilizing multimodal LLMs for accurate plot data extraction.

Replacing manual methods with high-throughput automated solutions.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal LLMs extract data from plots.

PlotExtract uses zero-shot engineered prompts.

Achieves high precision and recall rates.

🔎 Similar Papers

No similar papers found.