🤖 AI Summary
Existing large language models (LLMs) for tabular data exhibit significant performance degradation on out-of-domain generalization and general-purpose capabilities; conventional fine-tuning often improves tabular performance at the expense of generality.
Method: This paper systematically identifies the critical role of hyperparameters—especially learning rate—in balancing specialized and general capabilities, and proposes TAMA: a lightweight instruction-tuning paradigm that applies low-learning-rate, few-shot instruction tuning atop LLaMA-3.1-8B-Instruct to jointly enhance tabular understanding and general-purpose reasoning.
Contribution/Results: TAMA matches or surpasses GPT-3.5/4 across diverse tabular tasks while preserving strong performance on general benchmarks (e.g., MMLU, BBH) and out-of-domain tabular generalization. It achieves this with substantially reduced annotation cost and training overhead, challenging the prevailing assumption that tabular fine-tuning inevitably compromises general capability.
📝 Abstract
Recent advances in table understanding have focused on instruction-tuning large language models (LLMs) for table-related tasks. However, existing research has overlooked the impact of hyperparameter choices and lacks a comprehensive evaluation of the out-of-domain table understanding ability and the general capabilities of these table LLMs. In this paper, we evaluate these abilities in existing table LLMs, and reveal significant declines in both out-of-domain table understanding and general capabilities compared to their base models. Through systematic analysis, we show that hyperparameters, such as learning rate, can significantly influence both table-specific and general capabilities. Contrary to the existing table instruction-tuning works, we demonstrate that smaller learning rates and fewer training instances can enhance table understanding while preserving general capabilities. Based on our findings, we introduce TAMA, a TAble LLM instruction-tuned from LLaMA 3.1 8B Instruct, which achieves performance on par with, or surpassing GPT-3.5 and GPT-4 on table tasks, while maintaining strong out-of-domain generalization and general capabilities. Our findings highlight the potential for reduced data annotation costs and more efficient model development through careful hyperparameter selection.