Towards Probabilistic Question Answering Over Tabular Data

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing table-based question answering methods (e.g., NL2SQL) excel at factual retrieval but struggle with probabilistic queries requiring uncertainty reasoning. To address this gap, we propose LUCARIO—the first benchmark dedicated to probabilistic question answering over tabular data—and a symbolic-neural hybrid framework: it first models the table as a Bayesian network to capture probabilistic dependencies among variables, then leverages large language models (LLMs) to parse natural language questions into formal probabilistic queries and generate interpretable answers. This design synergistically integrates the precise, logically grounded inference of symbolic systems with the robust linguistic understanding of neural models. Experiments on the LUCARIO benchmark demonstrate that our approach significantly outperforms diverse baselines, achieving an average +18.7% absolute accuracy gain. To our knowledge, this is the first systematic validation of the effectiveness and scalability of coupling Bayesian modeling with LLMs for probabilistic table QA.

Technology Category

Application Category

📝 Abstract

Current approaches for question answering (QA) over tabular data, such as NL2SQL systems, perform well for factual questions where answers are directly retrieved from tables. However, they fall short on probabilistic questions requiring reasoning under uncertainty. In this paper, we introduce a new benchmark LUCARIO and a framework for probabilistic QA over large tabular data. Our method induces Bayesian Networks from tables, translates natural language queries into probabilistic queries, and uses large language models (LLMs) to generate final answers. Empirical results demonstrate significant improvements over baselines, highlighting the benefits of hybrid symbolic-neural reasoning.

Problem

Research questions and friction points this paper is trying to address.

Handling probabilistic questions over tabular data

Improving reasoning under uncertainty in QA systems

Integrating symbolic-neural methods for accurate answers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Induces Bayesian Networks from tables

Translates queries into probabilistic queries

Uses LLMs for hybrid symbolic-neural reasoning

🔎 Similar Papers

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering