ExpliCIT-QA: Explainable Code-Based Image Table Question Answering

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Image-based table visual question answering (TableVQA) systems lack interpretability and auditability in high-stakes domains such as finance and healthcare. Method: We propose a modular, interpretable multimodal QA framework that (i) structures image tables via integrated OCR and multimodal table understanding; (ii) generates executable Python/Pandas code using chain-of-thought prompting and natural language inference; and (iii) concurrently produces natural-language intermediate reasoning traces and full computational derivations. Contribution/Results: Unlike opaque end-to-end models, our approach ensures full pipeline transparency—from image input to answer output—enabling rigorous intermediate-result inspection. Evaluated on TableVQA-Bench, it achieves competitive accuracy while substantially improving interpretability and audit readiness. This establishes a new paradigm for trustworthy AI deployment in safety-critical applications.

Technology Category

Application Category

📝 Abstract
We present ExpliCIT-QA, a system that extends our previous MRT approach for tabular question answering into a multimodal pipeline capable of handling complex table images and providing explainable answers. ExpliCIT-QA follows a modular design, consisting of: (1) Multimodal Table Understanding, which uses a Chain-of-Thought approach to extract and transform content from table images; (2) Language-based Reasoning, where a step-by-step explanation in natural language is generated to solve the problem; (3) Automatic Code Generation, where Python/Pandas scripts are created based on the reasoning steps, with feedback for handling errors; (4) Code Execution to compute the final answer; and (5) Natural Language Explanation that describes how the answer was computed. The system is built for transparency and auditability: all intermediate outputs, parsed tables, reasoning steps, generated code, and final answers are available for inspection. This strategy works towards closing the explainability gap in end-to-end TableVQA systems. We evaluated ExpliCIT-QA on the TableVQA-Bench benchmark, comparing it with existing baselines. We demonstrated improvements in interpretability and transparency, which open the door for applications in sensitive domains like finance and healthcare where auditing results are critical.
Problem

Research questions and friction points this paper is trying to address.

Handling complex table images for question answering
Providing explainable answers with step-by-step reasoning
Generating executable code for transparent result computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Chain-of-Thought table understanding
Step-by-step natural language reasoning
Automatic Python/Pandas code generation
🔎 Similar Papers
No similar papers found.
Maximiliano Hormazábal Lagos
Maximiliano Hormazábal Lagos
PhD Student
Computer VisionNatural Language ProcessingDocument Image AnalysisVision Language Models
Á
Álvaro Bueno Sáez
Gradiant, Vigo, Galicia, Spain
P
Pedro Alonso Doval
Gradiant, Vigo, Galicia, Spain
J
Jorge Alcalde Vesteiro
Gradiant, Vigo, Galicia, Spain
H
Héctor Cerezo-Costas
Gradiant, Vigo, Galicia, Spain