Table as a Modality for Large Language Models

📅 2025-11-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods serialize tabular data into textual inputs for large language models (LLMs), causing severe loss of structural information and limiting performance on table reasoning tasks. To address this, we propose treating tables as an independent modality and introduce, for the first time, a hypergraph neural network (HGNN) to explicitly model the global structural semantics of tables, synergistically fused with an LLM within a novel multimodal framework—TAMO. This design avoids structural collapse inherent in conventional serialization, thereby significantly enhancing cross-modal understanding. Evaluated on five major benchmarks—HiTab, WikiTQ, WikiSQL, FeTaQA, and StructQA—TAMO achieves an average relative improvement of 42.65% over prior approaches. Moreover, it demonstrates superior generalization and robustness across diverse table reasoning scenarios, establishing new state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
To migrate the remarkable successes of Large Language Models (LLMs), the community has made numerous efforts to generalize them to the table reasoning tasks for the widely deployed tabular data. Despite that, in this work, by showing a probing experiment on our proposed StructQA benchmark, we postulate that even the most advanced LLMs (such as GPTs) may still fall short of coping with tabular data. More specifically, the current scheme often simply relies on serializing the tabular data, together with the meta information, then inputting them through the LLMs. We argue that the loss of structural information is the root of this shortcoming. In this work, we further propose TAMO, which bears an ideology to treat the tables as an independent modality integrated with the text tokens. The resulting model in TAMO is a multimodal framework consisting of a hypergraph neural network as the global table encoder seamlessly integrated with the mainstream LLM. Empirical results on various benchmarking datasets, including HiTab, WikiTQ, WikiSQL, FeTaQA, and StructQA, have demonstrated significant improvements on generalization with an average relative gain of 42.65%.
Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with tabular data reasoning
Structural information loss in serialized table inputs
Need multimodal integration of tables as separate modality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Treats tables as independent modality integrated with text tokens
Uses hypergraph neural network as global table encoder
Seamlessly integrates table encoder with mainstream large language model
🔎 Similar Papers
No similar papers found.
Liyao Li
Liyao Li
PhD Candidate, Zhejiang University
Table ReasoningLarge Tabular Language ModelMachine Learning
C
Chao Ye
Zhejiang University
Wentao Ye
Wentao Ye
Zhejiang University, Ant Research
LLMsMachine LearningMultimodality
Y
Yifei Sun
Zhejiang University
Z
Zhe Jiang
University of Michigan
Haobo Wang
Haobo Wang
Zhejiang University
Machine Learning
J
Jiaming Tian
Zhejiang University
Y
Yiming Zhang
Zhejiang University
N
Ningtao Wang
Ant Group
Xing Fu
Xing Fu
Ant Group
G
Gang Chen
Zhejiang University
J
Junbo Zhao
Zhejiang University