🤖 AI Summary
Existing table-based question answering methods (e.g., NL2SQL) excel at factual retrieval but struggle with probabilistic queries requiring uncertainty reasoning. To address this gap, we propose LUCARIO—the first benchmark dedicated to probabilistic question answering over tabular data—and a symbolic-neural hybrid framework: it first models the table as a Bayesian network to capture probabilistic dependencies among variables, then leverages large language models (LLMs) to parse natural language questions into formal probabilistic queries and generate interpretable answers. This design synergistically integrates the precise, logically grounded inference of symbolic systems with the robust linguistic understanding of neural models. Experiments on the LUCARIO benchmark demonstrate that our approach significantly outperforms diverse baselines, achieving an average +18.7% absolute accuracy gain. To our knowledge, this is the first systematic validation of the effectiveness and scalability of coupling Bayesian modeling with LLMs for probabilistic table QA.
📝 Abstract
Current approaches for question answering (QA) over tabular data, such as NL2SQL systems, perform well for factual questions where answers are directly retrieved from tables. However, they fall short on probabilistic questions requiring reasoning under uncertainty. In this paper, we introduce a new benchmark LUCARIO and a framework for probabilistic QA over large tabular data. Our method induces Bayesian Networks from tables, translates natural language queries into probabilistic queries, and uses large language models (LLMs) to generate final answers. Empirical results demonstrate significant improvements over baselines, highlighting the benefits of hybrid symbolic-neural reasoning.