General Table Question Answering via Answer-Formula Joint Generation

📅 2025-03-16

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

Existing TableQA approaches—such as SQL/Python generation or free-text answering—suffer from poor generalization and weak structural robustness. To address this, this paper introduces spreadsheet formulas as a unified logical representation for joint reasoning across diverse table structures and question types. Our key contributions are: (1) a novel answer-and-formula co-generation paradigm; (2) FormulaQA, the first large-scale TableQA benchmark with fine-grained formula annotations; and (3) TabAF, an end-to-end single-model architecture built upon Llama3.1-70B that jointly models table semantics and formula syntax constraints, supporting instruction tuning and constrained joint decoding. TabAF achieves state-of-the-art performance on WikiTableQuestions, HiTab, and TabFact, demonstrating substantial gains in complex reasoning accuracy and structural robustness over prior methods.

Technology Category

Application Category

📝 Abstract

Advanced table question answering (TableQA) methods prompt large language models (LLMs) to generate answer text, SQL query, Python code, or custom operations, which impressively improve the complex reasoning problems in the TableQA task. However, these methods lack the versatility to cope with specific question types or table structures. In contrast, the Spreadsheet Formula, the widely-used and well-defined operation language for tabular data, has not been thoroughly explored to solve TableQA. In this paper, we first attempt to use Formula as the logical form for solving complex reasoning on the tables with different structures. Specifically, we construct a large Formula-annotated TableQA dataset exttt{FromulaQA} from existing datasets. In addition, we propose exttt{TabAF}, a general table answering framework to solve multiple types of tasks over multiple types of tables simultaneously. Unlike existing methods, exttt{TabAF} decodes answers and Formulas with a single LLM backbone, demonstrating great versatility and generalization. exttt{TabAF} based on Llama3.1-70B achieves new state-of-the-art performance on the WikiTableQuestion, HiTab and TabFact.

Problem

Research questions and friction points this paper is trying to address.

Enhance table question answering with versatile formula-based solutions.

Address limitations of current methods in handling diverse table structures.

Propose a unified framework for generating answers and formulas simultaneously.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Spreadsheet Formula for TableQA tasks

Develops Formula-annotated dataset FormulaQA

Proposes TabAF framework with single LLM backbone

🔎 Similar Papers

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering