🤖 AI Summary
Existing tabular learning approaches struggle to effectively integrate enterprise-scale multi-table data with its semantic context, limiting the semantic reasoning capabilities of foundation models for structured data. This work proposes SALT-KG, a novel benchmark that systematically unifies multi-table transactional data with a metadata knowledge graph—termed OBKG—that encodes field descriptions, relational dependencies, and business object types, thereby establishing a semantic-aware paradigm for tabular learning. By incorporating metadata knowledge graph construction, multi-table relational modeling, and semantics-enhanced representation learning, the framework enables joint reasoning over tabular evidence and semantic context. Experimental results show that while metadata yields marginal gains on conventional metrics, it effectively exposes models’ deficiencies in leveraging semantics within relational contexts, offering a measurable benchmark and empirical foundation for research on semantically linked tabular data.
📝 Abstract
Building upon the SALT benchmark for relational prediction (Klein et al., 2024), we introduce SALT-KG, a benchmark for semantics-aware learning on enterprise tables. SALT-KG extends SALT by linking its multi-table transactional data with a structured Operational Business Knowledge represented in a Metadata Knowledge Graph (OBKG) that captures field-level descriptions, relational dependencies, and business object types. This extension enables evaluation of models that jointly reason over tabular evidence and contextual semantics, an increasingly critical capability for foundation models on structured data. Empirical analysis reveals that while metadata-derived features yield modest improvements in classical prediction metrics, these metadata features consistently highlight gaps in the ability of models to leverage semantics in relational context. By reframing tabular prediction as semantics-conditioned reasoning, SALT-KG establishes a benchmark to advance tabular foundation models grounded in declarative knowledge, providing the first empirical step toward semantically linked tables in structured data at enterprise scale.