🤖 AI Summary
Image-based table visual question answering (TableVQA) systems lack interpretability and auditability in high-stakes domains such as finance and healthcare.
Method: We propose a modular, interpretable multimodal QA framework that (i) structures image tables via integrated OCR and multimodal table understanding; (ii) generates executable Python/Pandas code using chain-of-thought prompting and natural language inference; and (iii) concurrently produces natural-language intermediate reasoning traces and full computational derivations.
Contribution/Results: Unlike opaque end-to-end models, our approach ensures full pipeline transparency—from image input to answer output—enabling rigorous intermediate-result inspection. Evaluated on TableVQA-Bench, it achieves competitive accuracy while substantially improving interpretability and audit readiness. This establishes a new paradigm for trustworthy AI deployment in safety-critical applications.
📝 Abstract
We present ExpliCIT-QA, a system that extends our previous MRT approach for tabular question answering into a multimodal pipeline capable of handling complex table images and providing explainable answers. ExpliCIT-QA follows a modular design, consisting of: (1) Multimodal Table Understanding, which uses a Chain-of-Thought approach to extract and transform content from table images; (2) Language-based Reasoning, where a step-by-step explanation in natural language is generated to solve the problem; (3) Automatic Code Generation, where Python/Pandas scripts are created based on the reasoning steps, with feedback for handling errors; (4) Code Execution to compute the final answer; and (5) Natural Language Explanation that describes how the answer was computed. The system is built for transparency and auditability: all intermediate outputs, parsed tables, reasoning steps, generated code, and final answers are available for inspection. This strategy works towards closing the explainability gap in end-to-end TableVQA systems. We evaluated ExpliCIT-QA on the TableVQA-Bench benchmark, comparing it with existing baselines. We demonstrated improvements in interpretability and transparency, which open the door for applications in sensitive domains like finance and healthcare where auditing results are critical.