Utilizing Training Data to Improve LLM Reasoning for Tabular Understanding

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This paper addresses the challenge of balancing generalization and data efficiency for large language models (LLMs) in table understanding and reasoning tasks. We propose LRTab, a novel framework that explicitly models erroneous reasoning paths—identified during training—as retrievable correction conditions. During inference, LRTab dynamically retrieves and injects chain-of-thought prompts conditioned on semantic similarity to these error patterns, thereby jointly optimizing learning and prompting. The method integrates supervision-driven conditional generation, fine-grained error analysis, and retrieval-augmented inference. Evaluated on WikiTQ and TabFact, LRTab significantly outperforms existing baselines, improving prediction accuracy, interpretability, and inference cost-efficiency. Empirical results demonstrate its broad applicability and practical utility for table reasoning.

Technology Category

Application Category

📝 Abstract

Automated tabular understanding and reasoning are essential tasks for data scientists. Recently, Large language models (LLMs) have become increasingly prevalent in tabular reasoning tasks. Previous work focuses on (1) finetuning LLMs using labeled data or (2) Training-free prompting LLM agents using chain-of-thought (CoT). Finetuning offers dataset-specific learning at the cost of generalizability. Training-free prompting is highly generalizable but does not take full advantage of training data. In this paper, we propose a novel prompting-based reasoning approach, Learn then Retrieve: LRTab, which integrates the benefits of both by retrieving relevant information learned from training data. We first use prompting to obtain CoT responses over the training data. For incorrect CoTs, we prompt the LLM to predict Prompt Conditions to avoid the error, learning insights from the data. We validate the effectiveness of Prompt Conditions using validation data. Finally, at inference time, we retrieve the most relevant Prompt Conditions for additional context for table understanding. We provide comprehensive experiments on WikiTQ and Tabfact, showing that LRTab is interpretable, cost-efficient, and can outperform previous baselines in tabular reasoning.

Problem

Research questions and friction points this paper is trying to address.

Improving LLM reasoning for tabular data understanding

Integrating training data benefits into prompt-based methods

Retrieving relevant learned conditions for better inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieving training data insights via prompting

Validating prompt conditions using validation data

Enhancing inference with relevant prompt conditions

🔎 Similar Papers

NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization