STAR: Semantic Table Representation with Header-Aware Clustering and Adaptive Weighted Fusion

📅 2026-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of embedding alignment between natural language queries and structured tables, which arises from their semantic and structural disparities. The authors propose a novel table representation method based on semantic clustering and adaptive weighted fusion. Specifically, they introduce a header-aware K-means clustering approach to select representative rows and construct diverse partial tables, then generate cluster-specific synthetic queries to comprehensively cover the semantic space. Finally, a fine-grained query-table alignment is achieved through an adaptive weighting strategy. This approach overcomes the limitations of conventional coarse-grained sampling and simplistic fusion schemes, consistently outperforming QGpT across five benchmark datasets with notable improvements in Recall, thereby demonstrating the effectiveness of the proposed mechanisms in enhancing both table semantic representation and retrieval accuracy.

Technology Category

Application Category

📝 Abstract
Table retrieval is the task of retrieving the most relevant tables from large-scale corpora given natural language queries. However, structural and semantic discrepancies between unstructured text and structured tables make embedding alignment particularly challenging. Recent methods such as QGpT attempt to enrich table semantics by generating synthetic queries, yet they still rely on coarse partial-table sampling and simple fusion strategies, which limit semantic diversity and hinder effective query-table alignment. We propose STAR (Semantic Table Representation), a lightweight framework that improves semantic table representation through semantic clustering and weighted fusion. STAR first applies header-aware K-means clustering to group semantically similar rows and selects representative centroid instances to construct a diverse partial table. It then generates cluster-specific synthetic queries to comprehensively cover the table's semantic space. Finally, STAR employs weighted fusion strategies to integrate table and query embeddings, enabling fine-grained semantic alignment. This design enables STAR to capture complementary information from structured and textual sources, improving the expressiveness of table representations. Experiments on five benchmarks show that STAR achieves consistently higher Recall than QGpT on all datasets, demonstrating the effectiveness of semantic clustering and adaptive weighted fusion for robust table representation. Our code is available at https://github.com/adsl135789/STAR.
Problem

Research questions and friction points this paper is trying to address.

table retrieval
semantic alignment
structured data
natural language query
table representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic table representation
header-aware clustering
adaptive weighted fusion
synthetic query generation
table retrieval
🔎 Similar Papers
No similar papers found.
S
Shui-Hsiang Hsu
National Chung Hsing University, Smart Sustainable New Agriculture Research Center (SMARTer), Taichung, Taiwan
T
Tsung-Hsiang Chou
National Chung Hsing University, Smart Sustainable New Agriculture Research Center (SMARTer), Taichung, Taiwan
C
Chen-Jui Yu
National Chung Hsing University, Smart Sustainable New Agriculture Research Center (SMARTer), Taichung, Taiwan
Yao-Chung Fan
Yao-Chung Fan
National Chung Hsing University, Taiwan
Natural Language ProcessingData MiningNatural Language Generation