PPT: A Process-based Preference Learning Framework for Self Improving Table Question Answering Models

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work addresses the scarcity of high-quality human-annotated data for Table Question Answering (TQA). We propose the first LLM self-improvement framework for TQA based on synthetically generated data. Methodologically, we model chain-of-thought reasoning as a discrete state sequence, introduce a state-level scoring mechanism and process-aware contrastive sampling, and apply lightweight preference learning via a PPO variant for reinforcement fine-tuning. Using only 8,000 self-generated preference pairs, our approach achieves up to +5.0% accuracy gain on in-domain test sets and +2.4% improvement in out-of-domain generalization. It attains 5× faster inference than current SOTA models while matching the performance of significantly larger systems. Our core contribution is the first efficient, low-overhead, process-aware self-improvement paradigm for TQA—uniquely balancing generalizability, inference efficiency, and scalability.

Technology Category

Application Category

📝 Abstract

Improving large language models (LLMs) with self-generated data has demonstrated success in tasks such as mathematical reasoning and code generation. Yet, no exploration has been made on table question answering (TQA), where a system answers questions based on tabular data. Addressing this gap is crucial for TQA, as effective self-improvement can boost performance without requiring costly or manually annotated data. In this work, we propose PPT, a Process-based Preference learning framework for TQA. It decomposes reasoning chains into discrete states, assigns scores to each state, and samples contrastive steps for preference learning. Experimental results show that PPT effectively improves TQA models by up to 5% on in-domain datasets and 2.4% on out-of-domain datasets, with only 8,000 preference pairs. Furthermore, the resulting models achieve competitive results compared to more complex and larger state-of-the-art TQA systems, while being five times more efficient during inference.

Problem

Research questions and friction points this paper is trying to address.

Addressing self-improvement gap in table question answering models

Proposing process-based preference learning for TQA enhancement

Boosting performance without costly manual data annotation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Process-based preference learning framework

Decomposes reasoning chains into states

Samples contrastive steps for learning

🔎 Similar Papers

Seek and Solve Reasoning for Table Question Answering